Pandas Basics 5
- Lecture1.1
  
  Pandas Basics
- Lecture1.2
  
  Pandas DataFrames
- Lecture1.3
  
  Apply
- Lecture1.4
  
  Boolean Indexing
- Lecture1.5
  
  Null Values
Data Transformations 6
- Lecture2.1
  
  Log Transformation
- Lecture2.2
  
  Clip
- Lecture2.3
  
  Stack
- Lecture2.4
  
  Multi-Index
- Lecture2.5
  
  Pivot
- Lecture2.6
  
  Window Functions
Statistics 4
- Lecture3.1
  
  Introduction
- Lecture3.2
  
  Descriptive Statistics
- Lecture3.3
  
  Bollinger Bands
- Lecture3.4
  
  Bollinger Bands Part 2
Reading and Writing Data 3
- Lecture4.1
  
  Reading and Writing Data
- Lecture4.2
  
  Using os
- Lecture4.3
  
  Reading in Chunks
Joins 5
- Lecture5.1
  
  Introduction
- Lecture5.2
  
  Joins
- Lecture5.3
  
  Joining with Duplicates
- Lecture5.4
  
  Multi-index Joins
- Lecture5.5
  
  Column Collisions
Grouping 4
- Lecture6.1
  
  Introduction
- Lecture6.2
  
  Combining Groupby and Join
- Lecture6.3
  
  Multi-index Grouping
- Lecture6.4
  
  Grouping and Apply
Introduction to Numpy 4
- Lecture7.1
  
  Introduction
- Lecture7.2
  
  Extending to 2 Dimensions
- Lecture7.3
  
  Extending to 3 Dimensions
- Lecture7.4
  
  Reshaping
Randomness 2
- Lecture8.1
  
  Random Seed
- Lecture8.2
  
  Random Functions
Numpy Data Functionality 1
- Lecture9.1
  
  Numpy Data Functionality

Grouping and Apply

Using Groupby to Apply a Function¶

You are not only limited to the regular functions, you have the freedom to also specify a function of your own to apply to each group. This is a very powerful level of functionality that I recommend mastering as it can save lots of time. Let’s take the example that we did for percent of daily sales, there is actually a way to do this with using the apply function. To begin with we need to create a function that could execute these things.

The first step is going to be to find an example of one slice of the group. Define the grouping object below.

In [12]:

#Create the group_obj
group_obj = sales.groupby('Day')

There is an attribute, groups, within the object the denotes which indices go to which group.

In [13]:

#Print the groups
print(group_obj.groups)

{1: Int64Index([0, 1, 2, 3, 4, 5], dtype='int64'), 2: Int64Index([6, 7, 8, 9, 10, 11], dtype='int64'), 3: Int64Index([12, 13, 14, 15, 16, 17], dtype='int64')}

By calling get_group() with the key for the group that you want, you can get a group sample!

In [14]:

#Get the sample group
sample_group = group_obj.get_group(1)
print(sample_group)

   Day  Store  Product  Sales
0    1      1        1     10
1    1      1        2     74
2    1      1        3     27
3    1      2        1     41
4    1      2        2     66
5    1      2        3     95

Now to build the function, it will need to take one parameter which is the passed data for each group. Let's denote it as g, and then from there we can index into sales, and get it divided by the sum!

In [15]:

#Create the function
def find_pct(g):
    pct = g['Sales'] / g['Sales'].sum()
    return pct

#Test our function
print(find_pct(sample_group))

0    0.031949
1    0.236422
2    0.086262
3    0.130990
4    0.210863
5    0.303514
Name: Sales, dtype: float64

Finally, to run this, we do groupby and then call apply passing this function.

In [16]:

#Apply the function
print(sales.groupby("Day").apply(find_pct))

Day
1    0     0.031949
     1     0.236422
     2     0.086262
     3     0.130990
     4     0.210863
     5     0.303514
2    6     0.003534
     7     0.081272
     8     0.236749
     9     0.303887
     10    0.307420
     11    0.067138
3    12    0.024896
     13    0.124481
     14    0.228216
     15    0.282158
     16    0.132780
     17    0.207469
Name: Sales, dtype: float64

One modification you may want to make is to get rid of the grouping column as the index. This can be done by setting , group_keys=False in the groupby.

In [17]:

#Apply the function without the day index
print(sales.groupby("Day", group_keys=False).apply(find_pct))

0     0.031949
1     0.236422
2     0.086262
3     0.130990
4     0.210863
5     0.303514
6     0.003534
7     0.081272
8     0.236749
9     0.303887
10    0.307420
11    0.067138
12    0.024896
13    0.124481
14    0.228216
15    0.282158
16    0.132780
17    0.207469
Name: Sales, dtype: float64

With this data you can add it into the dataframe by creating a column and setting the values equal to it.

In [18]:

#Create the new column
sales["Percent Daily Sales"] = sales.groupby("Day", group_keys=False).apply(find_pct)
print(sales)

    Day  Store  Product  Sales  Percent Daily Sales
0     1      1        1     10             0.031949
1     1      1        2     74             0.236422
2     1      1        3     27             0.086262
3     1      2        1     41             0.130990
4     1      2        2     66             0.210863
5     1      2        3     95             0.303514
6     2      1        1      1             0.003534
7     2      1        2     23             0.081272
8     2      1        3     67             0.236749
9     2      2        1     86             0.303887
10    2      2        2     87             0.307420
11    2      2        3     19             0.067138
12    3      1        1      6             0.024896
13    3      1        2     30             0.124481
14    3      1        3     55             0.228216
15    3      2        1     68             0.282158
16    3      2        2     32             0.132780
17    3      2        3     50             0.207469

Prev Multi-index Grouping

Next Introduction

Data Science

Using Groupby to Apply a Function¶

Leave A Reply Cancel reply

Data Science

Pandas Basics 5

Data Transformations 6

Statistics 4

Reading and Writing Data 3

Joins 5

Grouping 4

Introduction to Numpy 4

Randomness 2

Numpy Data Functionality 1

Grouping and Apply

Using Groupby to Apply a Function¶

Leave A Reply Cancel reply

Login with your site account

Register a new account