Search index
The Google Search Index¶
A free tool which can be used to measure interest in companies based on search volume is the google search index. By utilizing this index, we can review trends in searches to compare the overall time trend as well as any components of seasonality. An important thing to note is that these are all relative values. When we go to compare multiple searches, the values are normalized to be within a range of 0-100 relative to one and other (the terms) and the months that we are analyzing. Our first step, as always, is reading in and cleaning the data.
import pandas as pd
#Read the data
df = pd.read_csv("AirbnbSearch.csv", index_col=0)
print(df)
Something you may notice is that certain days are not proper numbers, but instead represented as "<1" as a string. Below we see an example of this.
print(df.loc['2010-01-01':])
By using applymap, we apply a function to each cell within the dataframe. The problem with this data is that we are dealing with integers as well as strings so if we try to apply a function for a string to an integer we will end up causing an error. To fix this, we can create a lambda function where we use if and else. The way to do this is to write a statement as "A if X else B" where A and B are the pieces of code to execute and X is the condition to decide which we execute. Below, we will replace "<1" with ".5" and then convert to a floating point number in the case that the type of the cell is a string, otherwise we just return the value because it is numerical.
#Convert <1 to be .5
df = df.applymap(lambda x: float(x.replace("<1", ".5")) if type(x) == str else x)
print(df)
Another modification we need to do is to convert the index of this dataframe to be a datetime object instead of a string.
df.index = pd.to_datetime(df.index)
print(df)
We will also want to see what the data actually looks like. Because we are using a dataframe it is relatively easy to just plot this result.
import matplotlib.pyplot as plt
#Plot the data
ax = df.plot(kind='line')
ax.set_xlabel("Year")
ax.set_ylabel("Search Index")
ax.set_title("Google Search Index Comparison")
plt.show()
Plotting Yearly Data¶
We might want to zoom into specific years to get a closer look because it appears that there may be seasonality. To do this we will rely on two functionalities of pandas that make this effecient and easy. First, we can easily pull the year from a datetime index by calling its year attribute like below:
print(df.index.year)
With this, we can find a boolean index of whether or not a year matches a given value and use that to find our yearly data. First, let's see what the index looks like for matching the year 2020, then how it can be used to return just values within the year 2020.
#Match years in 2020
i = df.index.year == 2020
print(i)
print()
#Return the data that matches this condition
print(df[i])
Now, iterate through the years 2015-2020 to get a deeper understanding.
for year in range(2015, 2021):
#Get the data for that year only
yearly_data = df[df.index.year == year]
ax = yearly_data.plot(kind='line')
ax.set_xlabel("Year")
ax.set_ylabel("Search Index")
ax.set_title("Google Search Index Comparison")
plt.show()