-
Geographical Analysis 6
-
Lecture1.1
-
Lecture1.2
-
Lecture1.3
-
Lecture1.4
-
Lecture1.5
-
Lecture1.6
-
-
Cap Table 3
-
Lecture2.1
-
Lecture2.2
-
Lecture2.3
-
-
Simulation 6
-
Lecture3.1
-
Lecture3.2
-
Lecture3.3
-
Lecture3.4
-
Lecture3.5
-
Lecture3.6
-
-
Search Index 8
-
Lecture4.1
-
Lecture4.2
-
Lecture4.3
-
Lecture4.4
-
Lecture4.5
-
Lecture4.6
-
Lecture4.7
-
Lecture4.8
-
-
Fund Distributions 5
-
Lecture5.1
-
Lecture5.2
-
Lecture5.3
-
Lecture5.4
-
Lecture5.5
-
Search index
The Google Search Index¶
A free tool which can be used to measure interest in companies based on search volume is the google search index. By utilizing this index, we can review trends in searches to compare the overall time trend as well as any components of seasonality. An important thing to note is that these are all relative values. When we go to compare multiple searches, the values are normalized to be within a range of 0-100 relative to one and other (the terms) and the months that we are analyzing. Our first step, as always, is reading in and cleaning the data.
import pandas as pd
#Read the data
df = pd.read_csv("AirbnbSearch.csv", index_col=0)
print(df)
Airbnb Vrbo HomeAway Booking.com
Month
2004-01 0 1 0 0
2004-02 0 1 0 0
2004-03 0 1 0 0
2004-04 0 2 0 0
2004-05 0 2 0 0
... ... ... ... ...
2020-08 90 33 2 4
2020-09 73 27 1 3
2020-10 62 22 1 3
2020-11 53 17 1 2
2020-12 95 18 1 2
[204 rows x 4 columns]
Something you may notice is that certain days are not proper numbers, but instead represented as "<1" as a string. Below we see an example of this.
print(df.loc['2010-01-01':])
Airbnb Vrbo HomeAway Booking.com
Month
2010-02 <1 12 2 1
2010-03 <1 12 2 1
2010-04 <1 11 1 1
2010-05 <1 11 1 2
2010-06 <1 14 2 2
... ... ... ... ...
2020-08 90 33 2 4
2020-09 73 27 1 3
2020-10 62 22 1 3
2020-11 53 17 1 2
2020-12 95 18 1 2
[131 rows x 4 columns]
By using applymap, we apply a function to each cell within the dataframe. The problem with this data is that we are dealing with integers as well as strings so if we try to apply a function for a string to an integer we will end up causing an error. To fix this, we can create a lambda function where we use if and else. The way to do this is to write a statement as "A if X else B" where A and B are the pieces of code to execute and X is the condition to decide which we execute. Below, we will replace "<1" with ".5" and then convert to a floating point number in the case that the type of the cell is a string, otherwise we just return the value because it is numerical.
#Convert <1 to be .5
df = df.applymap(lambda x: float(x.replace("<1", ".5")) if type(x) == str else x)
print(df)
Airbnb Vrbo HomeAway Booking.com
Month
2004-01 0.0 1 0.0 0.0
2004-02 0.0 1 0.0 0.0
2004-03 0.0 1 0.0 0.0
2004-04 0.0 2 0.0 0.0
2004-05 0.0 2 0.0 0.0
... ... ... ... ...
2020-08 90.0 33 2.0 4.0
2020-09 73.0 27 1.0 3.0
2020-10 62.0 22 1.0 3.0
2020-11 53.0 17 1.0 2.0
2020-12 95.0 18 1.0 2.0
[204 rows x 4 columns]
Another modification we need to do is to convert the index of this dataframe to be a datetime object instead of a string.
df.index = pd.to_datetime(df.index)
print(df)
Airbnb Vrbo HomeAway Booking.com
Month
2004-01-01 0.0 1 0.0 0.0
2004-02-01 0.0 1 0.0 0.0
2004-03-01 0.0 1 0.0 0.0
2004-04-01 0.0 2 0.0 0.0
2004-05-01 0.0 2 0.0 0.0
... ... ... ... ...
2020-08-01 90.0 33 2.0 4.0
2020-09-01 73.0 27 1.0 3.0
2020-10-01 62.0 22 1.0 3.0
2020-11-01 53.0 17 1.0 2.0
2020-12-01 95.0 18 1.0 2.0
[204 rows x 4 columns]
We will also want to see what the data actually looks like. Because we are using a dataframe it is relatively easy to just plot this result.
import matplotlib.pyplot as plt
#Plot the data
ax = df.plot(kind='line')
ax.set_xlabel("Year")
ax.set_ylabel("Search Index")
ax.set_title("Google Search Index Comparison")
plt.show()
Plotting Yearly Data¶
We might want to zoom into specific years to get a closer look because it appears that there may be seasonality. To do this we will rely on two functionalities of pandas that make this effecient and easy. First, we can easily pull the year from a datetime index by calling its year attribute like below:
print(df.index.year)
Int64Index([2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004,
...
2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020],
dtype='int64', name='Month', length=204)
With this, we can find a boolean index of whether or not a year matches a given value and use that to find our yearly data. First, let's see what the index looks like for matching the year 2020, then how it can be used to return just values within the year 2020.
#Match years in 2020
i = df.index.year == 2020
print(i)
print()
#Return the data that matches this condition
print(df[i])
[False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
True True True True True True True True True True True True]
Airbnb Vrbo HomeAway Booking.com
Month
2020-01-01 84.0 34 3.0 4.0
2020-02-01 82.0 30 2.0 4.0
2020-03-01 60.0 21 2.0 4.0
2020-04-01 36.0 12 1.0 2.0
2020-05-01 72.0 30 3.0 3.0
2020-06-01 100.0 41 4.0 4.0
2020-07-01 96.0 36 2.0 4.0
2020-08-01 90.0 33 2.0 4.0
2020-09-01 73.0 27 1.0 3.0
2020-10-01 62.0 22 1.0 3.0
2020-11-01 53.0 17 1.0 2.0
2020-12-01 95.0 18 1.0 2.0
Now, iterate through the years 2015-2020 to get a deeper understanding.
for year in range(2015, 2021):
#Get the data for that year only
yearly_data = df[df.index.year == year]
ax = yearly_data.plot(kind='line')
ax.set_xlabel("Year")
ax.set_ylabel("Search Index")
ax.set_title("Google Search Index Comparison")
plt.show()