-
Pandas Basics 5
-
Lecture1.1
-
Lecture1.2
-
Lecture1.3
-
Lecture1.4
-
Lecture1.5
-
-
Data Transformations 6
-
Lecture2.1
-
Lecture2.2
-
Lecture2.3
-
Lecture2.4
-
Lecture2.5
-
Lecture2.6
-
-
Statistics 4
-
Lecture3.1
-
Lecture3.2
-
Lecture3.3
-
Lecture3.4
-
-
Reading and Writing Data 3
-
Lecture4.1
-
Lecture4.2
-
Lecture4.3
-
-
Joins 5
-
Lecture5.1
-
Lecture5.2
-
Lecture5.3
-
Lecture5.4
-
Lecture5.5
-
-
Grouping 4
-
Lecture6.1
-
Lecture6.2
-
Lecture6.3
-
Lecture6.4
-
-
Introduction to Numpy 4
-
Lecture7.1
-
Lecture7.2
-
Lecture7.3
-
Lecture7.4
-
-
Randomness 2
-
Lecture8.1
-
Lecture8.2
-
-
Numpy Data Functionality 1
-
Lecture9.1
-
Grouping and Apply
Using Groupby to Apply a Function¶
You are not only limited to the regular functions, you have the freedom to also specify a function of your own to apply to each group. This is a very powerful level of functionality that I recommend mastering as it can save lots of time. Let’s take the example that we did for percent of daily sales, there is actually a way to do this with using the apply function. To begin with we need to create a function that could execute these things.
The first step is going to be to find an example of one slice of the group. Define the grouping object below.
#Create the group_obj
group_obj = sales.groupby('Day')
There is an attribute, groups, within the object the denotes which indices go to which group.
#Print the groups
print(group_obj.groups)
{1: Int64Index([0, 1, 2, 3, 4, 5], dtype='int64'), 2: Int64Index([6, 7, 8, 9, 10, 11], dtype='int64'), 3: Int64Index([12, 13, 14, 15, 16, 17], dtype='int64')}
By calling get_group() with the key for the group that you want, you can get a group sample!
#Get the sample group
sample_group = group_obj.get_group(1)
print(sample_group)
Day Store Product Sales
0 1 1 1 10
1 1 1 2 74
2 1 1 3 27
3 1 2 1 41
4 1 2 2 66
5 1 2 3 95
Now to build the function, it will need to take one parameter which is the passed data for each group. Let's denote it as g, and then from there we can index into sales, and get it divided by the sum!
#Create the function
def find_pct(g):
pct = g['Sales'] / g['Sales'].sum()
return pct
#Test our function
print(find_pct(sample_group))
0 0.031949
1 0.236422
2 0.086262
3 0.130990
4 0.210863
5 0.303514
Name: Sales, dtype: float64
Finally, to run this, we do groupby and then call apply passing this function.
#Apply the function
print(sales.groupby("Day").apply(find_pct))
Day
1 0 0.031949
1 0.236422
2 0.086262
3 0.130990
4 0.210863
5 0.303514
2 6 0.003534
7 0.081272
8 0.236749
9 0.303887
10 0.307420
11 0.067138
3 12 0.024896
13 0.124481
14 0.228216
15 0.282158
16 0.132780
17 0.207469
Name: Sales, dtype: float64
One modification you may want to make is to get rid of the grouping column as the index. This can be done by setting , group_keys=False in the groupby.
#Apply the function without the day index
print(sales.groupby("Day", group_keys=False).apply(find_pct))
0 0.031949
1 0.236422
2 0.086262
3 0.130990
4 0.210863
5 0.303514
6 0.003534
7 0.081272
8 0.236749
9 0.303887
10 0.307420
11 0.067138
12 0.024896
13 0.124481
14 0.228216
15 0.282158
16 0.132780
17 0.207469
Name: Sales, dtype: float64
With this data you can add it into the dataframe by creating a column and setting the values equal to it.
#Create the new column
sales["Percent Daily Sales"] = sales.groupby("Day", group_keys=False).apply(find_pct)
print(sales)
Day Store Product Sales Percent Daily Sales
0 1 1 1 10 0.031949
1 1 1 2 74 0.236422
2 1 1 3 27 0.086262
3 1 2 1 41 0.130990
4 1 2 2 66 0.210863
5 1 2 3 95 0.303514
6 2 1 1 1 0.003534
7 2 1 2 23 0.081272
8 2 1 3 67 0.236749
9 2 2 1 86 0.303887
10 2 2 2 87 0.307420
11 2 2 3 19 0.067138
12 3 1 1 6 0.024896
13 3 1 2 30 0.124481
14 3 1 3 55 0.228216
15 3 2 1 68 0.282158
16 3 2 2 32 0.132780
17 3 2 3 50 0.207469