Pivot

Pivot¶

If you have used pivot tables in excel, this function should feel natural to you. The pivot function will take three columns and then pivot for those values. The first argument is which column indicates the row, the second is which indicates the columns, and the third is which indicates the values. Notice how the sales_location_data looks before and after pivoting.

In [16]:

#Create the data
sales_location_data = pd.DataFrame([["A", 1, 1000],
             ["B", 1, 5000],
             ["C", 1, 2000],
             ["A", 2, 4000],
             ["B", 2, 2000],
             ["A", 3, 8000],
             ["C", 3, 8000]], columns=['Product', 'Location', 'Sales'])

print("Before the pivot:")
print(sales_location_data)
print()

#Pivot
sales_location_data = sales_location_data.pivot('Product', 'Location', 'Sales')
print("After the pivot:")
print(sales_location_data)

Before the pivot:
  Product  Location  Sales
0       A         1   1000
1       B         1   5000
2       C         1   2000
3       A         2   4000
4       B         2   2000
5       A         3   8000
6       C         3   8000

After the pivot:
Location       1       2       3
Product
A         1000.0  4000.0  8000.0
B         5000.0  2000.0     NaN
C         2000.0     NaN  8000.0

If there is not a value then we automatically get a null value. We can fill those with 0 by using fillna.

In [17]:

#Fill null values
sales_location_data = sales_location_data.fillna(0)
print(sales_location_data)

Location       1       2       3
Product
A         1000.0  4000.0  8000.0
B         5000.0  2000.0     0.0
C         2000.0     0.0  8000.0

Applying Functions to Different Axes¶

There are many built in functions you can use to find statistics or other measures for either the rows or the columns. For example, if you call mean() for the dataframe we just worked with, we get the average for each of the columns.

In [18]:

print(sales_location_data.mean())

Location
1    2666.666667
2    2000.000000
3    5333.333333
dtype: float64

If you give the argument of axis=1, we get the mean by the rows.

In [19]:

print(sales_location_data.mean(axis=1))

Product
A    4333.333333
B    2333.333333
C    3333.333333
dtype: float64

The default behavior is axis=0 or the average by the columns.

In [20]:

print(sales_location_data.mean(axis=0))

Location
1    2666.666667
2    2000.000000
3    5333.333333
dtype: float64

One thing to note is how these functions will work differently with null values. Let's first bring back the data with null values.

In [21]:

#Create the data
sales_location_data = pd.DataFrame([["A", 1, 1000],
             ["B", 1, 5000],
             ["C", 1, 2000],
             ["A", 2, 4000],
             ["B", 2, 2000],
             ["A", 3, 8000],
             ["C", 3, 8000]], columns=['Product', 'Location', 'Sales'])

#Pivot
sales_location_data = sales_location_data.pivot('Product', 'Location', 'Sales')
print(sales_location_data)

Location       1       2       3
Product
A         1000.0  4000.0  8000.0
B         5000.0  2000.0     NaN
C         2000.0     NaN  8000.0

If we call for the mean now, by column, it is going to ignore the null values instead of counting them as 0!

In [22]:

print(sales_location_data.mean())

Location
1    2666.666667
2    3000.000000
3    8000.000000
dtype: float64

It is important to decide how to handle the null values because of this! Also there are many other functions you can apply, for example:

In [23]:

#Get a few metrics

print("Maximum:")
print(sales_location_data.max())
print()
print()

print("Minimum:")
print(sales_location_data.min())
print()
print()

print("Sum:")
print(sales_location_data.sum())
print()
print()

Maximum:
Location
1    5000.0
2    4000.0
3    8000.0
dtype: float64


Minimum:
Location
1    1000.0
2    2000.0
3    8000.0
dtype: float64


Sum:
Location
1     8000.0
2     6000.0
3    16000.0
dtype: float64

Data Science

Data Science

Pivot

Pivot¶

Applying Functions to Different Axes¶

Leave A Reply Cancel reply

Modal title