Data Processing
Data Processing¶
For our report, we are going to actually have it be modular to a given year and quarter to be defined over. One thing we are going to need to know how to do is to grab the first day of the next quarter.
If we want the next quarter we can just add 1, except for 4 where we want that to become 1. If we do the modulo function we can achieve this. Below shows the example of this for all four quarters.
In [4]:
# Grab each next quarter
for quarter in range(1,5):
print(quarter % 4 + 1)
2
3
4
1
In [5]:
#Define a function that processes the data given the year/quarter we want to use as of
def process_data_yield_curve(yield_curve: DataFrame, year: int, quarter: int) -> DataFrame:
"""
The function for processing of the yield curve data
Parameters
----------
yield_curve : DataFrame
The raw yield curve data
year : int
The year for the report
quarter : int
The quarter for the report
Returns
-------
DataFrame
Yield table
"""
# Copy the yield curve to avoid overwriting
yield_curve = yield_curve.copy()
#Find the next quarter
end_quarter = quarter % 4 + 1
#Convert to the month
end_month = (end_quarter-1) * 3 + 1
# Find the end year which is the same unless the quarter is 4
end_year = year
if quarter == 4:
end_year += 1
# Finally create the datetime and subtract one day from it for the last day
end_date = datetime(end_year, end_month, 1) - pd.Timedelta("1D")
# Index the yield
yield_curve = yield_curve.loc[:end_date]
# Create the yield table
yield_table = create_yield_table(yield_curve, end_date)
return yield_table
yield_table = process_data_yield_curve(yield_curve, 2019, 1)
print(yield_table)
Current 1 Year Ago Curve 5 Years Ago Curve 1 Year Change \
1 Mo 2.43 1.63 0.03 0.80
3 Mo 2.40 1.73 0.05 0.67
6 Mo 2.44 1.93 0.07 0.51
1 Yr 2.40 2.09 0.13 0.31
2 Yr 2.27 2.27 0.44 0.00
3 Yr 2.21 2.39 0.90 -0.18
5 Yr 2.23 2.56 1.73 -0.33
7 Yr 2.31 2.68 2.30 -0.37
10 Yr 2.41 2.74 2.73 -0.33
20 Yr 2.63 2.85 3.31 -0.22
30 Yr 2.81 2.97 3.56 -0.16
5 Year Change
1 Mo 2.40
3 Mo 2.35
6 Mo 2.37
1 Yr 2.27
2 Yr 1.83
3 Yr 1.31
5 Yr 0.50
7 Yr 0.01
10 Yr -0.32
20 Yr -0.68
30 Yr -0.75
Likewise, we build out the same type of function to process our real estate data. It will return to us the returns, the quarterly returns, and the returns adjusted for inflation.
In [6]:
#Define a similar function for the real estate data
def process_data_real_estate(data: DataFrame, year: int, quarter: int) -> Tuple[DataFrame, DataFrame, DataFrame]:
"""
Function which process the real estate data
Parameters
----------
data : DataFrame
Real estate data
year : int
The year for the last quarter to use
quarter : int
The last quarter to use
Returns
-------
Tuple[DataFrame, DataFrame, DataFrame]
1. Dataframe of returns
2. Quarterly returns
3. Quarterly returns adjusted for inflation
"""
# Divide all data by the starting values at the end of 2000
returns = data / data.loc[(1999,4)]
# Only use 2000 and on
returns = returns.loc[(2000,1):]
# Add adjustments for inflation
returns["Real Estate Inflation Adjusted"] = returns["Real Estate"] / returns["CPI"]
# Cut the data to be only up to the last quarter/year requested
returns = returns.loc[:(year, quarter)]
# Get the percentage changes for real estate
quarterly_returns = data["Real Estate"].pct_change()
quarterly_returns = quarterly_returns.loc[(2000,1):(year, quarter)]
quarterly_returns = quarterly_returns.unstack()
# Get the adjusted quarterly returns
quarterly_returns_adj = returns["Real Estate Inflation Adjusted"].pct_change()
#Because there is no baseline to compare to, the percent change will be negative in the first year
#We can set it to be equal to the index - 1
quarterly_returns_adj.iloc[0] = returns["Real Estate Inflation Adjusted"].iloc[0] - 1
quarterly_returns_adj = quarterly_returns_adj.unstack()
return returns, quarterly_returns, quarterly_returns_adj
# Run our function and check the outputs
returns_RE, quarterly_returns_RE, quarterly_returns_adj_RE = process_data_real_estate(real_estate_data, 2019, 1)
print(returns_RE.head(5))
print()
print(quarterly_returns_RE.head(5))
print()
print(quarterly_returns_adj_RE.head(5))
Real Estate CPI Real Estate Inflation Adjusted
Year Quarter
2000 1 1.019145 1.013033 1.006033
2 1.035752 1.020142 1.015302
3 1.054586 1.028436 1.025427
4 1.070837 1.034360 1.035265
2001 1 1.097240 1.043246 1.051755
Quarter 1 2 3 4
Year
2000 0.019145 0.016295 0.018183 0.015410
2001 0.024656 0.016759 0.015525 0.012418
2002 0.014362 0.017029 0.019829 0.014168
2003 0.012187 0.012184 0.015411 0.028326
2004 0.015643 0.024409 0.038143 0.021346
Quarter 1 2 3 4
Year
2000 0.006033 0.009213 0.009972 0.009594
2001 0.015928 0.007604 0.013244 0.016413
2002 0.008111 0.010800 0.013060 0.008589
2003 0.000629 0.016607 0.004439 0.026108
2004 0.006958 0.014648 0.033221 0.011223