Pivot
Pivot¶
If you have used pivot tables in excel, this function should feel natural to you. The pivot function will take three columns and then pivot for those values. The first argument is which column indicates the row, the second is which indicates the columns, and the third is which indicates the values. Notice how the sales_location_data looks before and after pivoting.
#Create the data
sales_location_data = pd.DataFrame([["A", 1, 1000],
["B", 1, 5000],
["C", 1, 2000],
["A", 2, 4000],
["B", 2, 2000],
["A", 3, 8000],
["C", 3, 8000]], columns=['Product', 'Location', 'Sales'])
print("Before the pivot:")
print(sales_location_data)
print()
#Pivot
sales_location_data = sales_location_data.pivot('Product', 'Location', 'Sales')
print("After the pivot:")
print(sales_location_data)
If there is not a value then we automatically get a null value. We can fill those with 0 by using fillna.
#Fill null values
sales_location_data = sales_location_data.fillna(0)
print(sales_location_data)
Applying Functions to Different Axes¶
There are many built in functions you can use to find statistics or other measures for either the rows or the columns. For example, if you call mean() for the dataframe we just worked with, we get the average for each of the columns.
print(sales_location_data.mean())
If you give the argument of axis=1, we get the mean by the rows.
print(sales_location_data.mean(axis=1))
The default behavior is axis=0 or the average by the columns.
print(sales_location_data.mean(axis=0))
One thing to note is how these functions will work differently with null values. Let's first bring back the data with null values.
#Create the data
sales_location_data = pd.DataFrame([["A", 1, 1000],
["B", 1, 5000],
["C", 1, 2000],
["A", 2, 4000],
["B", 2, 2000],
["A", 3, 8000],
["C", 3, 8000]], columns=['Product', 'Location', 'Sales'])
#Pivot
sales_location_data = sales_location_data.pivot('Product', 'Location', 'Sales')
print(sales_location_data)
If we call for the mean now, by column, it is going to ignore the null values instead of counting them as 0!
print(sales_location_data.mean())
It is important to decide how to handle the null values because of this! Also there are many other functions you can apply, for example:
#Get a few metrics
print("Maximum:")
print(sales_location_data.max())
print()
print()
print("Minimum:")
print(sales_location_data.min())
print()
print()
print("Sum:")
print(sales_location_data.sum())
print()
print()