-
Pandas Basics 5
-
Lecture1.1
-
Lecture1.2
-
Lecture1.3
-
Lecture1.4
-
Lecture1.5
-
-
Data Transformations 6
-
Lecture2.1
-
Lecture2.2
-
Lecture2.3
-
Lecture2.4
-
Lecture2.5
-
Lecture2.6
-
-
Statistics 4
-
Lecture3.1
-
Lecture3.2
-
Lecture3.3
-
Lecture3.4
-
-
Reading and Writing Data 3
-
Lecture4.1
-
Lecture4.2
-
Lecture4.3
-
-
Joins 5
-
Lecture5.1
-
Lecture5.2
-
Lecture5.3
-
Lecture5.4
-
Lecture5.5
-
-
Grouping 4
-
Lecture6.1
-
Lecture6.2
-
Lecture6.3
-
Lecture6.4
-
-
Introduction to Numpy 4
-
Lecture7.1
-
Lecture7.2
-
Lecture7.3
-
Lecture7.4
-
-
Randomness 2
-
Lecture8.1
-
Lecture8.2
-
-
Numpy Data Functionality 1
-
Lecture9.1
-
Window Functions
Window Functions¶
Window functions let us apply something over segments of our data. To begin with, let’s get a set of data that is supposed to be daily sales. Don’t worry about the code below it is just creating the data.
np.random.seed(3)
daily_sales = pd.Series(np.random.normal(100, 2000, 100))
daily_sales = daily_sales.clip(0, None) + np.array(range(100)) * 80
daily_sales.index = daily_sales.index + 1
daily_sales.plot(kind='line')
plt.xlabel("Day")
plt.ylabel("Sales")
plt.title("Sales by Day")
plt.show()
Expanding Window¶
An expanding window is one where we go through each row of data and run a function using that row and everything before it. For our data here there is an obvious application, we can find the running total sales over the days. The way we call it is calling expanding() on a series or dataframe and then calling another function after it. The code below creates total sales by running expanding and adding up the total sales at each point.
#Get ther running total with an expanding window
total_sales = daily_sales.expanding().sum()
print(total_sales)
1 3677.256947
2 4730.276648
3 5183.271584
4 5423.271584
5 5743.271584
...
96 440201.979933
97 449340.123133
98 457100.123133
99 464940.123133
100 477276.421817
Length: 100, dtype: float64
We may also want to have the first value be equal to 0. We can do that by using loc and setting the value to 0.
#Set t=0 to 0
total_sales.loc[0] = 0
print(total_sales)
1 3677.256947
2 4730.276648
3 5183.271584
4 5423.271584
5 5743.271584
...
97 449340.123133
98 457100.123133
99 464940.123133
100 477276.421817
0 0.000000
Length: 101, dtype: float64
It is out of order, though, and so we want to use sort_index() to sort our data into the correct order.
#Sort total sales
total_sales = total_sales.sort_index()
print(total_sales)
0 0.000000
1 3677.256947
2 4730.276648
3 5183.271584
4 5423.271584
...
96 440201.979933
97 449340.123133
98 457100.123133
99 464940.123133
100 477276.421817
Length: 101, dtype: float64
#Plot total sales
total_sales.plot(kind='line')
plt.xlabel("Day")
plt.ylabel("Sales")
plt.title("Total Sales")
plt.show()
Rolling Window¶
A rolling window takes the last n rows including the current one and computes a calculation. For this example, we might be interested in smoothing out our daily sales to see if there is a trend. To do this we can use rolling and the function mean to find the rolling mean. You will notice larger numbers for n leads to smoother functions. For example:
#Find the averages of the last 3 days
avg3d = daily_sales.rolling(3).mean()
#Plot with the regular sales
daily_sales.plot(kind='line')
avg3d.plot(kind='line')
plt.xlabel("Day")
plt.ylabel("Sales")
plt.title("Sales by Day")
plt.legend(['Daily Sales', '3 Day Average'])
plt.show()
#Find the averages of the last 10 days
avg10d = daily_sales.rolling(10).mean()
#Plot with the regular sales
daily_sales.plot(kind='line')
avg10d.plot(kind='line')
plt.xlabel("Day")
plt.ylabel("Sales")
plt.title("Sales by Day")
plt.legend(['Daily Sales', '10 Day Average'])
plt.show()