print("Hello World test")
Note that there are no semicolons at the end.
a = 7
print(a)
print(7 + 8)
print(a + 7)
b = 7.0
print(b)
c = 'hello'
print(c)
c = "hello"
print(c)
x = True
y = False
z = None # similar to null in java
d = []
d.append(5)
print(d)
e = ['a','b','c','d','e']
print(e[3])
f = (1,2,3)
print(f)
x = 2
print(x == 2) # prints out True
print(x == 3) # prints out False
print(x < 3) # prints out True
if x == 2:
print("x is two")
elif x==3:
print("x is three")
else:
print("x is not two nor three")
name = "John"
age = 23
if name == "John" and age == 23:
print("Your name is John, and you are also 23 years old.")
if name == "John" or name == "Rick":
print("Your name is either John or Rick.")
The "in" operator could be used to check if a specified object exists within an iterable object container, such as a list
name = "John"
if name in ["John", "Rick"]:
print("Your name is either John or Rick.")
Using "not" before a boolean expression inverts it:
print(not False)
primes = [2, 3, 5, 7]
for prime in primes:
print(prime)
To get a similar behaviour as in java you can use range:
for i in range(5):
print(i)
count = 0
while count < 5:
print(count)
count += 1 # This is the same as count = count + 1
count = 0
while True:
if count < 5:
break
As a first example we will load a dataset (csv) from a URL.
But as a first step we have to import Pandas:
import pandas as pd
Now you can access pandas with 'pd'.
Loading datasets is done by one of the following functions:
Loading a dataset from the filesystem is as easy as placing the corresponding file next to the notebook and give the filename as parameter like
pd.read_csv('myfile.csv')
tips_dataset = pd.read_csv('http://tinyurl.com/tips-csv')
If you want see see only a few datapoints you can also just use the .head() function (shows the top datapoints) or the .tail() function(shows the bottom rows)
tips_dataset.head()
On the left of each row you see increasing numbers. This is the index. It is not contained in the file, but helps pandas to join datasets.
tips_dataset.index
To get an overview of your data (at least numeric data) you can also use the .describe() function.
tips_dataset.describe()
Sorting is done by .sort_values(by='column')
tips_dataset_sorted = tips_dataset.sort_values(by='total_bill')
tips_dataset_sorted
Selecting a single column:
tips_dataset_sorted['day']
Selecting multiple columns:
tips_dataset_sorted[['day','total_bill']]
Using a single column’s values to select data.
tips_dataset_sorted[ tips_dataset_sorted['total_bill'] > 10.0 ]
By “group by” we are referring to a process involving one or more of the following steps
tips_dataset_sorted.groupby('day')
The resut of the groupby function is just an intermediate result, you have to decide how to "combine" the results within a group
tips_dataset_sorted.groupby('day').mean()
tips_dataset_sorted.groupby('day').sum()
Note that also in the new dataset the day attribute becomes your index.
For plotting we need an addittional import statement:
import matplotlib.pyplot as plt
Afterwards we can call the function .plot()
tips_dataset['total_bill'].plot()
plt.show()
Plotting methods allow for a handful of plot styles other than the default Line plot. These methods can be provided as the kind
keyword argument to plot(). These include:
tips_dataset['total_bill'].plot(kind='hist')
plt.show()
tips_dataset.plot(kind='scatter', x='total_bill', y='tip')
plt.show()
huge_tips = tips_dataset[tips_dataset['tip'] > 5]
huge_tips.head()
There is also a nice annotation function which allows to add (for example) the weekday
huge_tips.plot(kind='scatter', x='total_bill', y='tip')
for index, total_bill, tip, sex, smoker, day, time, size in huge_tips.itertuples():
plt.annotate(
day, # text to print
(total_bill, tip) # position in (x, y)
)
plt.show()
Plot categorical data:
plt.figure(figsize=(10,8))#make the image a bit bigger
for name, group in tips_dataset.groupby('day'):
plt.scatter(group['total_bill'], group['tip'], label=name)
#ste the axis labels
plt.xlabel("total_bill")
plt.ylabel("tip")
plt.legend()
plt.show()