Index Operations
Index Operations¶
There are some more advanced index operations we are going to explore in this section of the course. The first thing we can learn to do is grab the quarter for each date. As you can see, it will return integers between 1-4 denoting what quarter a date is in.
# You can access the quarter of the data through the quarter attribute
print(real_estate.index.quarter)
Int64Index([1, 2, 3, 4, 1, 2, 3, 4, 1, 2,
...
3, 4, 1, 2, 3, 4, 1, 2, 3, 4],
dtype='int64', name='DATE', length=180)
Multi-index¶
Pandas allows for indices that are actually with multiple levels. One can create an index like this by calling pd.MultiIndex.from_tuples which will (given tuples) create the multi-index. For example, we can play with something basic below.
# Example of building a multi-index
multi_index = pd.MultiIndex.from_tuples([("A", 1),
("B", 2),
("C", 3)])
print(multi_index)
MultiIndex([('A', 1),
('B', 2),
('C', 3)],
)
Zip can turn two or more lists into a list of tuples like so below. We show how with the prior data.
# Show zipping
letters = ["A", "B", "C"]
numbers = [1, 2, 3]
l = list(zip(letters, numbers))
print(l)
[('A', 1), ('B', 2), ('C', 3)]
Bringing both of these concepts together, we can make a multi-index for our data by zipping the year and the quarter of the real estate data and then converting it to a multi-index.
# By using zip, we can create a multi index where the dates become the year and quarter
multi_index = pd.MultiIndex.from_tuples(zip(real_estate.index.year,real_estate.index.quarter))
print(multi_index)
MultiIndex([(1975, 1),
(1975, 2),
(1975, 3),
(1975, 4),
(1976, 1),
(1976, 2),
(1976, 3),
(1976, 4),
(1977, 1),
(1977, 2),
...
(2017, 3),
(2017, 4),
(2018, 1),
(2018, 2),
(2018, 3),
(2018, 4),
(2019, 1),
(2019, 2),
(2019, 3),
(2019, 4)],
length=180)
# Set the real estate index
real_estate.index = multi_index
print(real_estate)
Real Estate
1975 1 59.77
2 60.97
3 61.18
4 62.22
1976 1 62.90
... ...
2018 4 429.86
2019 1 434.58
2 442.71
3 447.87
4 451.65
[180 rows x 1 columns]
You can actually index into the year if you want which is really nice for quicker access.
# Get the 2017 data
print(real_estate.loc[2017])
Real Estate
1 385.83
2 395.29
3 401.97
4 405.83
This approach is going to come in very handy for converting our current CPI data to being of the frequency of quarterly. The way to accomplish this is to take our CPI dataframe, group by the year and quarter, and then we call the last() function on that. It will return to us, for each quarter, the last value that was present there.
# If we group the CPI by quarter and year, we can apply the last function to get the latest row for each
# combination of quarter and year
quarterly_CPI = CPI.groupby([CPI.index.year, CPI.index.quarter]).last()
print(quarterly_CPI)
CPI
DATE DATE
1947 1 22.000
2 22.080
3 22.840
4 23.410
1948 1 23.500
... ...
2019 1 254.095
2 255.402
3 256.593
4 258.444
2020 1 259.050
[293 rows x 1 columns]
There are also names that can be given to each level of the multi-index. For our two quarterly sets of data, we want to give them the name of year and quarter.
# Rename the indices
real_estate.index.names = ["Year", "Quarter"]
quarterly_CPI.index.names = ["Year", "Quarter"]
print(real_estate.head(5))
print()
print()
print(quarterly_CPI.head(5))
Real Estate
Year Quarter
1975 1 59.77
2 60.97
3 61.18
4 62.22
1976 1 62.90
CPI
Year Quarter
1947 1 22.00
2 22.08
3 22.84
4 23.41
1948 1 23.50
Finally, we join them together!
# Join the two together
data = real_estate.join(quarterly_CPI)
print(data)
Real Estate CPI
Year Quarter
1975 1 59.77 52.800
2 60.97 53.500
3 61.18 54.600
4 62.22 55.600
1976 1 62.90 56.000
... ... ...
2018 4 429.86 252.653
2019 1 434.58 254.095
2 442.71 255.402
3 447.87 256.593
4 451.65 258.444
[180 rows x 2 columns]
For our analysis of the real estate values, we are going to want to first find a baseline value. Let's use the end of 2000 as this value.
#Let's see what the values were at the end of 2000
print(data.loc[(1999,4)])
Real Estate 224.6
CPI 168.8
Name: (1999, 4), dtype: float64
We will find the total return since 2000 by dividing the values by this slice of the data.
#If we were to divide the data by this date and see from 2000 on, we would be able to see the group since 2000 began
returns = data / data.loc[(1999,4)]
returns = returns.loc[(2000,1):]
print(returns)
Real Estate CPI
Year Quarter
2000 1 1.019145 1.013033
2 1.035752 1.020142
3 1.054586 1.028436
4 1.070837 1.034360
2001 1 1.097240 1.043246
... ... ...
2018 4 1.913891 1.496759
2019 1 1.934907 1.505302
2 1.971104 1.513045
3 1.994078 1.520101
4 2.010908 1.531066
[80 rows x 2 columns]