-
Geographical Analysis 6
-
Lecture1.1
-
Lecture1.2
-
Lecture1.3
-
Lecture1.4
-
Lecture1.5
-
Lecture1.6
-
-
Cap Table 3
-
Lecture2.1
-
Lecture2.2
-
Lecture2.3
-
-
Simulation 6
-
Lecture3.1
-
Lecture3.2
-
Lecture3.3
-
Lecture3.4
-
Lecture3.5
-
Lecture3.6
-
-
Search Index 8
-
Lecture4.1
-
Lecture4.2
-
Lecture4.3
-
Lecture4.4
-
Lecture4.5
-
Lecture4.6
-
Lecture4.7
-
Lecture4.8
-
-
Fund Distributions 5
-
Lecture5.1
-
Lecture5.2
-
Lecture5.3
-
Lecture5.4
-
Lecture5.5
-
Seasonal Decomposition
Multiplicative Seasonal Decomposition¶
Another way to smooth out the data to find insights is to use a decomposition which takes a series and forms predictions based off of the following model:
$ Y_{i} = T_{i} * S_{i} * \epsilon_{i} $
where
$ Y_{i} = \text{Series Value at time i} $
$ T_{i} = \text{Trend Value at time i} $
$ S_{i} = \text{Seasonal Value at time i} $
$ \epsilon_{i} = \text{Error Value at time i} $
We will work through how each piece (the trend and the seasonal effect) are found after first using an already created version of seasonal decomposition. The seasonal_decompose function within statsmodels takes a series of data to decompose and as well can optionally take the model (in this case we use multiplicative) as well as the period (in this case we use 12 because we want to see monthly effects. Finally, we can use the plot() function on the object we get back to plot the trend and seasonality.
from statsmodels.tsa.seasonal import seasonal_decompose
Y = airbnb['Search']
#Run the decomposition
result = seasonal_decompose(Y, model='multiplicative', period=12)
#Plot the decomposition
result.plot()
plt.show()
Computing the Trend Component¶
For this first version of the seasonaldecompose, a moving average is used based on the period. If the period is odd, then the moving average is equal to a simple average around the point $Y{t}$. First, to clean up the formula let's define a variable B for the lower/upper bound as:
$B = \frac{P-1}{2}$
$T_{t} = \frac{Y_{t-B} + Y_{t-B+1}+...+Y_{t}+...+Y_{t+B-1}+Y_{t+B}}{P} $
If the period is even, then we need to change this so that we have P+1 data points and the two on the end are each weighted to be only half. So the equation would be (notice the .5 multiplied on the end values):
$B = \frac{P}{2}$
$T_{t} = \frac{.5 * Y_{t-B} + Y_{t-B+1}+...+Y_{t}+...+Y_{t+B-1}+.5*Y_{t+B}}{P} $
#Print out the trend values for the decomposition
print(result.trend.dropna())
#Compute the trend component based on period of 12 to check
trend = sum([Y.shift(x) for x in range(-5, 6)]) + Y.shift(-6) * .5 + Y.shift(6) * .5
trend = trend / 12
print(trend.dropna())
Date
2012-07-01 4.666667
2012-08-01 4.958333
2012-09-01 5.250000
2012-10-01 5.583333
2012-11-01 5.958333
...
2019-02-01 70.958333
2019-03-01 72.000000
2019-04-01 73.083333
2019-05-01 74.208333
2019-06-01 75.166667
Name: trend, Length: 84, dtype: float64
Date
2012-07-01 4.666667
2012-08-01 4.958333
2012-09-01 5.250000
2012-10-01 5.583333
2012-11-01 5.958333
...
2019-02-01 70.958333
2019-03-01 72.000000
2019-04-01 73.083333
2019-05-01 74.208333
2019-06-01 75.166667
Name: Search, Length: 84, dtype: float64
Computing the Seasonal Component¶
The seasonal component is found by grouping each season (in this case the month) and finding the average of the value divided by the trend, and finally dividing these numbers by the average. We will skip the math to go straight into showing it since it will be easier to understand.
For each given season, we need to find the average of the value divided by trend. Using january as an example, let's work step by step through it. First, create the de-trended data. This is done by dividing each value by its trend.
Y_detrended = Y / trend
Y_detrended.plot(kind='line')
plt.show()
Now for january, we need to find each value which falls in that month. Because it is the first value in the dataframe, and we know it repeats every 12 times, we can do this....
print(Y_detrended.iloc[::12])
Date
2012-01-01 NaN
2013-01-01 0.878049
2014-01-01 0.962963
2015-01-01 0.834783
2016-01-01 0.896703
2017-01-01 0.991896
2018-01-01 1.038689
2019-01-01 1.029184
Name: Search, dtype: float64
Take the mean of these numbers.
print(Y_detrended.iloc[::12].mean())
0.9474666405490085
Generalize this to be every month now by using list comprehension and changing the starting index for each but keeping that same 12 period repeat.
import numpy as np
seasonal = np.array([Y_detrended.iloc[i::12].mean() for i in range(12)])
print(seasonal)
[0.94746664 0.92398382 1.01092864 1.02782975 1.12638624 1.22338223
1.21746959 1.12768306 0.91495187 0.8728329 0.78663362 0.76479533]
The final step is to divide these numbers the average to normalize them to 1.
#Normalize the seasonal values
seasonal = seasonal/seasonal.mean()
print(seasonal)
print()
#Check they are the same as the output of statsmodels
print(result.seasonal.values[:12])
[0.95188149 0.92828925 1.0156392 1.03261906 1.13163479 1.22908275
1.22314256 1.13293765 0.91921522 0.87689999 0.79029905 0.768359 ]
[0.95188149 0.92828925 1.0156392 1.03261906 1.13163479 1.22908275
1.22314256 1.13293765 0.91921522 0.87689999 0.79029905 0.768359 ]
The Residual¶
The residual will easily flow from the results from before, it is just a matter of following the equation from above (replicated below).
$ Y_{i} = T_{i} * S_{i} * \epsilon_{i} $
Our seasonal values are of length 12 right now, but we need them to repeat the number of years we have. The function tile from numpy will do this for us, and since we have 8 years, we need to do tile with 8 repeats.
seasonal = np.tile(seasonal, 8)
print(seasonal)
[0.95188149 0.92828925 1.0156392 1.03261906 1.13163479 1.22908275
1.22314256 1.13293765 0.91921522 0.87689999 0.79029905 0.768359
0.95188149 0.92828925 1.0156392 1.03261906 1.13163479 1.22908275
1.22314256 1.13293765 0.91921522 0.87689999 0.79029905 0.768359
0.95188149 0.92828925 1.0156392 1.03261906 1.13163479 1.22908275
1.22314256 1.13293765 0.91921522 0.87689999 0.79029905 0.768359
0.95188149 0.92828925 1.0156392 1.03261906 1.13163479 1.22908275
1.22314256 1.13293765 0.91921522 0.87689999 0.79029905 0.768359
0.95188149 0.92828925 1.0156392 1.03261906 1.13163479 1.22908275
1.22314256 1.13293765 0.91921522 0.87689999 0.79029905 0.768359
0.95188149 0.92828925 1.0156392 1.03261906 1.13163479 1.22908275
1.22314256 1.13293765 0.91921522 0.87689999 0.79029905 0.768359
0.95188149 0.92828925 1.0156392 1.03261906 1.13163479 1.22908275
1.22314256 1.13293765 0.91921522 0.87689999 0.79029905 0.768359
0.95188149 0.92828925 1.0156392 1.03261906 1.13163479 1.22908275
1.22314256 1.13293765 0.91921522 0.87689999 0.79029905 0.768359 ]
#Compute the residual
residual = Y / trend / seasonal
print(residual.dropna())
print()
#Check the values match
print(result.resid.dropna())
Date
2012-07-01 0.875964
2012-08-01 1.068094
2012-09-01 1.036080
2012-10-01 1.021237
2012-11-01 1.061827
...
2019-02-01 1.062702
2019-03-01 1.025627
2019-04-01 0.967307
2019-05-01 0.952645
2019-06-01 1.006645
Name: Search, Length: 84, dtype: float64
Date
2012-07-01 0.875964
2012-08-01 1.068094
2012-09-01 1.036080
2012-10-01 1.021237
2012-11-01 1.061827
...
2019-02-01 1.062702
2019-03-01 1.025627
2019-04-01 0.967307
2019-05-01 0.952645
2019-06-01 1.006645
Name: resid, Length: 84, dtype: float64