Intro to Python, Pandas and a bit of Matplotlib¶

Printing¶

To print a string or an object you just have to call print(obj)

print("Hello World test")

Hello World test

Note that there are no semicolons at the end.

Variables and Types¶

object oriented, not "statically typed"
variables without declaring their type
Every variable is an object
Variables and functions are 'lowercased_with_underscores'
There are different types of objects

Numbers¶

a = 7
print(a)
print(7 + 8)

print(a + 7)

7
15
14

Floating point numbers¶

b = 7.0
print(b)

7.0

Strings¶

c = 'hello'
print(c)
c = "hello"
print(c)

hello
hello

Boolean values / None¶

x = True
y = False

z = None # similar to null in java

Lists (mutable)¶

d = []
d.append(5)
print(d)

e = ['a','b','c','d','e']
print(e[3])

[5]
d

Tuple (immutable)¶

f = (1,2,3)
print(f)

(1, 2, 3)

Conditions¶

Simple Conditions¶

x = 2
print(x == 2) # prints out True
print(x == 3) # prints out False
print(x < 3) # prints out True

True
False
True

If Statement¶

if x == 2:
    print("x is two")
elif x==3:
    print("x is three")
else:
    print("x is not two nor three")

x is two

Python uses indentation to define code blocks, instead of brackets
- no '{', '}' any more :-)

Boolean operators¶

name = "John"
age = 23
if name == "John" and age == 23:
    print("Your name is John, and you are also 23 years old.")

if name == "John" or name == "Rick":
    print("Your name is either John or Rick.")

Your name is John, and you are also 23 years old.
Your name is either John or Rick.

"In" Operator¶

The "in" operator could be used to check if a specified object exists within an iterable object container, such as a list

name = "John"
if name in ["John", "Rick"]:
    print("Your name is either John or Rick.")

Your name is either John or Rick.

"not" operator¶

Using "not" before a boolean expression inverts it:

print(not False)

True

Loops¶

For loop¶

primes = [2, 3, 5, 7]
for prime in primes:
    print(prime)

To get a similar behaviour as in java you can use range:

for i in range(5):
    print(i)

While loops¶

count = 0
while count < 5:
    print(count)
    count += 1  # This is the same as count = count + 1

Break and continue¶

break and continue as in other programming languages
There is no do while loop in python but you can use:

count = 0
while True:
    if count < 5:
        break

That's it for the moment - continue with pandas¶

Load data¶

As a first example we will load a dataset (csv) from a URL.

But as a first step we have to import Pandas:

import pandas as pd

Now you can access pandas with 'pd'.

Loading datasets is done by one of the following functions:

pd.read_csv function
pd.read_excel function
for an overview see https://pandas.pydata.org/pandas-docs/stable/io.html

Loading a dataset from the filesystem is as easy as placing the corresponding file next to the notebook and give the filename as parameter like

pd.read_csv('myfile.csv')

tips_dataset = pd.read_csv('http://tinyurl.com/tips-csv')

Viewing Data¶

If you want see see only a few datapoints you can also just use the .head() function (shows the top datapoints) or the .tail() function(shows the bottom rows)

tips_dataset.head()

On the left of each row you see increasing numbers. This is the index. It is not contained in the file, but helps pandas to join datasets.

tips_dataset.index

RangeIndex(start=0, stop=244, step=1)

To get an overview of your data (at least numeric data) you can also use the .describe() function.

tips_dataset.describe()

Sorting¶

Sorting is done by .sort_values(by='column')

tips_dataset_sorted = tips_dataset.sort_values(by='total_bill')
tips_dataset_sorted

Selection¶

Selecting a single column:

tips_dataset_sorted['day']

67      Sat
92      Fri
111     Sat
172     Sun
149    Thur
195    Thur
218     Sat
145    Thur
135    Thur
126    Thur
222     Fri
6       Sun
30      Sat
178     Sun
43      Sun
148    Thur
53      Sun
235     Sat
82     Thur
226     Fri
10      Sun
51      Sun
16      Sun
136    Thur
1       Sun
196    Thur
75      Sat
168     Sat
169     Sat
117    Thur
       ... 
44      Sun
187     Sun
39      Sat
167     Sun
173     Sun
47      Sun
83     Thur
237     Sat
175     Sun
141    Thur
179     Sun
180     Sun
52      Sun
85     Thur
11      Sun
238     Sat
56      Sat
112     Sun
207     Sat
23      Sat
95      Fri
184     Sun
142    Thur
197    Thur
102     Sat
182     Sun
156     Sun
59      Sat
212     Sat
170     Sat
Name: day, Length: 244, dtype: object

Selecting multiple columns:

tips_dataset_sorted[['day','total_bill']]

Boolean Indexing / Filtering¶

Using a single column’s values to select data.

tips_dataset_sorted[  tips_dataset_sorted['total_bill'] > 10.0   ]

Grouping¶

By “group by” we are referring to a process involving one or more of the following steps

Splitting the data into groups based on some criteria
Applying a function to each group independently
Combining the results into a data structure

tips_dataset_sorted.groupby('day')

<pandas.core.groupby.DataFrameGroupBy object at 0x000002492C4AC7F0>

The resut of the groupby function is just an intermediate result, you have to decide how to "combine" the results within a group

sum()
mean()
std()
min()
max()

tips_dataset_sorted.groupby('day').mean()

tips_dataset_sorted.groupby('day').sum()

Note that also in the new dataset the day attribute becomes your index.

Plotting¶

For plotting we need an addittional import statement:

import matplotlib.pyplot as plt

Afterwards we can call the function .plot()

tips_dataset['total_bill'].plot()
plt.show()

Plotting methods allow for a handful of plot styles other than the default Line plot. These methods can be provided as the kind keyword argument to plot(). These include:

‘bar’ or ‘barh’ for bar plots
‘hist’ for histogram
‘box’ for boxplot
‘kde’ or 'density' for density plots
‘area’ for area plots
‘scatter’ for scatter plots
‘hexbin’ for hexagonal bin plots
‘pie’ for pie plots

tips_dataset['total_bill'].plot(kind='hist')
plt.show()

tips_dataset.plot(kind='scatter', x='total_bill', y='tip')
plt.show()

huge_tips = tips_dataset[tips_dataset['tip'] > 5]
huge_tips.head()

There is also a nice annotation function which allows to add (for example) the weekday

huge_tips.plot(kind='scatter', x='total_bill', y='tip')

for index, total_bill, tip, sex, smoker, day, time, size in huge_tips.itertuples():
    plt.annotate(
        day, # text to print
        (total_bill, tip) # position in (x, y)
    )
    
plt.show()

Plot categorical data:

plt.figure(figsize=(10,8))#make the image a bit bigger

for name, group in tips_dataset.groupby('day'):
    plt.scatter(group['total_bill'], group['tip'], label=name)

#ste the axis labels
plt.xlabel("total_bill")
plt.ylabel("tip")
plt.legend()

plt.show()

	total_bill	tip	sex	smoker	day	time	size
0	16.99	1.01	Female	No	Sun	Dinner	2
1	10.34	1.66	Male	No	Sun	Dinner	3
2	21.01	3.50	Male	No	Sun	Dinner	3
3	23.68	3.31	Male	No	Sun	Dinner	2
4	24.59	3.61	Female	No	Sun	Dinner	4

	total_bill	tip	size
count	244.000000	244.000000	244.000000
mean	19.785943	2.998279	2.569672
std	8.902412	1.383638	0.951100
min	3.070000	1.000000	1.000000
25%	13.347500	2.000000	2.000000
50%	17.795000	2.900000	2.000000
75%	24.127500	3.562500	3.000000
max	50.810000	10.000000	6.000000

	total_bill	tip	sex	smoker	day	time	size
67	3.07	1.00	Female	Yes	Sat	Dinner	1
92	5.75	1.00	Female	Yes	Fri	Dinner	2
111	7.25	1.00	Female	No	Sat	Dinner	1
172	7.25	5.15	Male	Yes	Sun	Dinner	2
149	7.51	2.00	Male	No	Thur	Lunch	2
195	7.56	1.44	Male	No	Thur	Lunch	2
218	7.74	1.44	Male	Yes	Sat	Dinner	2
145	8.35	1.50	Female	No	Thur	Lunch	2
135	8.51	1.25	Female	No	Thur	Lunch	2
126	8.52	1.48	Male	No	Thur	Lunch	2
222	8.58	1.92	Male	Yes	Fri	Lunch	1
6	8.77	2.00	Male	No	Sun	Dinner	2
30	9.55	1.45	Male	No	Sat	Dinner	2
178	9.60	4.00	Female	Yes	Sun	Dinner	2
43	9.68	1.32	Male	No	Sun	Dinner	2
148	9.78	1.73	Male	No	Thur	Lunch	2
53	9.94	1.56	Male	No	Sun	Dinner	2
235	10.07	1.25	Male	No	Sat	Dinner	2
82	10.07	1.83	Female	No	Thur	Lunch	1
226	10.09	2.00	Female	Yes	Fri	Lunch	2
10	10.27	1.71	Male	No	Sun	Dinner	2
51	10.29	2.60	Female	No	Sun	Dinner	2
16	10.33	1.67	Female	No	Sun	Dinner	3
136	10.33	2.00	Female	No	Thur	Lunch	2
1	10.34	1.66	Male	No	Sun	Dinner	3
196	10.34	2.00	Male	Yes	Thur	Lunch	2
75	10.51	1.25	Male	No	Sat	Dinner	2
168	10.59	1.61	Female	Yes	Sat	Dinner	2
169	10.63	2.00	Female	Yes	Sat	Dinner	2
117	10.65	1.50	Female	No	Thur	Lunch	2
...	...	...	...	...	...	...	...
44	30.40	5.60	Male	No	Sun	Dinner	4
187	30.46	2.00	Male	Yes	Sun	Dinner	5
39	31.27	5.00	Male	No	Sat	Dinner	3
167	31.71	4.50	Male	No	Sun	Dinner	4
173	31.85	3.18	Male	Yes	Sun	Dinner	2
47	32.40	6.00	Male	No	Sun	Dinner	4
83	32.68	5.00	Male	Yes	Thur	Lunch	2
237	32.83	1.17	Male	Yes	Sat	Dinner	2
175	32.90	3.11	Male	Yes	Sun	Dinner	2
141	34.30	6.70	Male	No	Thur	Lunch	6
179	34.63	3.55	Male	Yes	Sun	Dinner	2
180	34.65	3.68	Male	Yes	Sun	Dinner	4
52	34.81	5.20	Female	No	Sun	Dinner	4
85	34.83	5.17	Female	No	Thur	Lunch	4
11	35.26	5.00	Female	No	Sun	Dinner	4
238	35.83	4.67	Female	No	Sat	Dinner	3
56	38.01	3.00	Male	Yes	Sat	Dinner	4
112	38.07	4.00	Male	No	Sun	Dinner	3
207	38.73	3.00	Male	Yes	Sat	Dinner	4
23	39.42	7.58	Male	No	Sat	Dinner	4
95	40.17	4.73	Male	Yes	Fri	Dinner	4
184	40.55	3.00	Male	Yes	Sun	Dinner	2
142	41.19	5.00	Male	No	Thur	Lunch	5
197	43.11	5.00	Female	Yes	Thur	Lunch	4
102	44.30	2.50	Female	Yes	Sat	Dinner	3
182	45.35	3.50	Male	Yes	Sun	Dinner	3
156	48.17	5.00	Male	No	Sun	Dinner	6
59	48.27	6.73	Male	No	Sat	Dinner	4
212	48.33	9.00	Male	No	Sat	Dinner	4
170	50.81	10.00	Male	Yes	Sat	Dinner	3

	total_bill	tip	sex	smoker	day	time	size
235	10.07	1.25	Male	No	Sat	Dinner	2
82	10.07	1.83	Female	No	Thur	Lunch	1
226	10.09	2.00	Female	Yes	Fri	Lunch	2
10	10.27	1.71	Male	No	Sun	Dinner	2
51	10.29	2.60	Female	No	Sun	Dinner	2
16	10.33	1.67	Female	No	Sun	Dinner	3
136	10.33	2.00	Female	No	Thur	Lunch	2
1	10.34	1.66	Male	No	Sun	Dinner	3
196	10.34	2.00	Male	Yes	Thur	Lunch	2
75	10.51	1.25	Male	No	Sat	Dinner	2
168	10.59	1.61	Female	Yes	Sat	Dinner	2
169	10.63	2.00	Female	Yes	Sat	Dinner	2
117	10.65	1.50	Female	No	Thur	Lunch	2
233	10.77	1.47	Male	No	Sat	Dinner	2
62	11.02	1.98	Male	Yes	Sat	Dinner	2
132	11.17	1.50	Female	No	Thur	Lunch	2
58	11.24	1.76	Male	Yes	Sat	Dinner	2
100	11.35	2.50	Female	Yes	Fri	Dinner	2
128	11.38	2.00	Female	No	Thur	Lunch	2
217	11.59	1.50	Male	Yes	Sat	Dinner	2
232	11.61	3.39	Male	No	Sat	Dinner	2
120	11.69	2.31	Male	No	Thur	Lunch	2
147	11.87	1.63	Female	No	Thur	Lunch	2
70	12.02	1.97	Male	No	Sat	Dinner	2
97	12.03	1.50	Male	Yes	Fri	Dinner	2
220	12.16	2.20	Male	Yes	Fri	Lunch	2
133	12.26	2.00	Female	No	Thur	Lunch	2
118	12.43	1.80	Female	No	Thur	Lunch	2
99	12.46	1.50	Male	No	Fri	Dinner	2
124	12.48	2.52	Female	No	Thur	Lunch	2
...	...	...	...	...	...	...	...
44	30.40	5.60	Male	No	Sun	Dinner	4
187	30.46	2.00	Male	Yes	Sun	Dinner	5
39	31.27	5.00	Male	No	Sat	Dinner	3
167	31.71	4.50	Male	No	Sun	Dinner	4
173	31.85	3.18	Male	Yes	Sun	Dinner	2
47	32.40	6.00	Male	No	Sun	Dinner	4
83	32.68	5.00	Male	Yes	Thur	Lunch	2
237	32.83	1.17	Male	Yes	Sat	Dinner	2
175	32.90	3.11	Male	Yes	Sun	Dinner	2
141	34.30	6.70	Male	No	Thur	Lunch	6
179	34.63	3.55	Male	Yes	Sun	Dinner	2
180	34.65	3.68	Male	Yes	Sun	Dinner	4
52	34.81	5.20	Female	No	Sun	Dinner	4
85	34.83	5.17	Female	No	Thur	Lunch	4
11	35.26	5.00	Female	No	Sun	Dinner	4
238	35.83	4.67	Female	No	Sat	Dinner	3
56	38.01	3.00	Male	Yes	Sat	Dinner	4
112	38.07	4.00	Male	No	Sun	Dinner	3
207	38.73	3.00	Male	Yes	Sat	Dinner	4
23	39.42	7.58	Male	No	Sat	Dinner	4
95	40.17	4.73	Male	Yes	Fri	Dinner	4
184	40.55	3.00	Male	Yes	Sun	Dinner	2
142	41.19	5.00	Male	No	Thur	Lunch	5
197	43.11	5.00	Female	Yes	Thur	Lunch	4
102	44.30	2.50	Female	Yes	Sat	Dinner	3
182	45.35	3.50	Male	Yes	Sun	Dinner	3
156	48.17	5.00	Male	No	Sun	Dinner	6
59	48.27	6.73	Male	No	Sat	Dinner	4
212	48.33	9.00	Male	No	Sat	Dinner	4
170	50.81	10.00	Male	Yes	Sat	Dinner	3

	total_bill	tip	size
day
Fri	17.151579	2.734737	2.105263
Sat	20.441379	2.993103	2.517241
Sun	21.410000	3.255132	2.842105
Thur	17.682742	2.771452	2.451613

	total_bill	tip	size
day
Fri	325.88	51.96	40
Sat	1778.40	260.40	219
Sun	1627.16	247.39	216
Thur	1096.33	171.83	152