-
Pandas Basics 5
-
Lecture1.1
-
Lecture1.2
-
Lecture1.3
-
Lecture1.4
-
Lecture1.5
-
-
Data Transformations 6
-
Lecture2.1
-
Lecture2.2
-
Lecture2.3
-
Lecture2.4
-
Lecture2.5
-
Lecture2.6
-
-
Statistics 4
-
Lecture3.1
-
Lecture3.2
-
Lecture3.3
-
Lecture3.4
-
-
Reading and Writing Data 3
-
Lecture4.1
-
Lecture4.2
-
Lecture4.3
-
-
Joins 5
-
Lecture5.1
-
Lecture5.2
-
Lecture5.3
-
Lecture5.4
-
Lecture5.5
-
-
Grouping 4
-
Lecture6.1
-
Lecture6.2
-
Lecture6.3
-
Lecture6.4
-
-
Introduction to Numpy 4
-
Lecture7.1
-
Lecture7.2
-
Lecture7.3
-
Lecture7.4
-
-
Randomness 2
-
Lecture8.1
-
Lecture8.2
-
-
Numpy Data Functionality 1
-
Lecture9.1
-
Multi-index Grouping
Multi-index Grouping¶
You are not limited to just using a single column for grouping, you can also pass in a list to do the grouping based on multiple groupings. Let’s take an example by bringing back the original dataset again. One thing to keep in mind is you will get back data with a multi-index based on these groupings.
In [9]:
#Create the data
sales = pd.DataFrame([[1, 1, 1, 10],
[1, 1, 2, 74],
[1, 1, 3, 27],
[1, 2, 1, 41],
[1, 2, 2, 66],
[1, 2, 3, 95],
[2, 1, 1, 1],
[2, 1, 2, 23],
[2, 1, 3, 67],
[2, 2, 1, 86],
[2, 2, 2, 87],
[2, 2, 3, 19],
[3, 1, 1, 6],
[3, 1, 2, 30],
[3, 1, 3, 55],
[3, 2, 1, 68],
[3, 2, 2, 32],
[3, 2, 3, 50]],
columns=['Day', 'Store', 'Product', 'Sales'])
print(sales)
Day Store Product Sales
0 1 1 1 10
1 1 1 2 74
2 1 1 3 27
3 1 2 1 41
4 1 2 2 66
5 1 2 3 95
6 2 1 1 1
7 2 1 2 23
8 2 1 3 67
9 2 2 1 86
10 2 2 2 87
11 2 2 3 19
12 3 1 1 6
13 3 1 2 30
14 3 1 3 55
15 3 2 1 68
16 3 2 2 32
17 3 2 3 50
Now what if we wanted to find the total sales by day and product? We can use both of them in grouping like below:
In [10]:
#Group by day and product
group_data = sales.groupby(['Day', 'Product'])['Sales'].sum()
print(group_data)
Day Product
1 1 51
2 140
3 122
2 1 87
2 110
3 86
3 1 74
2 62
3 105
Name: Sales, dtype: int64
You might also want something like average sales for each store-day combo.
In [11]:
group_data = sales.groupby(['Store', 'Day'])['Sales'].mean()
print(group_data)
Store Day
1 1 37.000000
2 30.333333
3 30.333333
2 1 67.333333
2 64.000000
3 50.000000
Name: Sales, dtype: float64
Next
Grouping and Apply