Handling Dates
Handling Dates¶
There are a few date aspects to take care of with the data going forward. One nice property of datetimes is we can replace a component such as the year. This will not make the change in place, however.
from datetime import datetime
# Create an example date
test = datetime(2019,12,31)
print(test)
# Replace the year with 2018
print(test.replace(year=2018))
# But it will not change the element, it will actually return it so the variable test stays the same
print(test.year)
2019-12-31 00:00:00
2018-12-31 00:00:00
2019
Below is an example of how we can have a first date, then also have the date for the year before.
# Create the first date
date1 = datetime(2019, 12, 31)
# Find the date for the year prior
date2 = date1.replace(year=date1.year-1)
# Print the yield curves at the date
print(yield_curve.loc[date1])
print()
print()
print(yield_curve.loc[date2])
1 Mo 1.48
3 Mo 1.55
6 Mo 1.60
1 Yr 1.59
2 Yr 1.58
3 Yr 1.62
5 Yr 1.69
7 Yr 1.83
10 Yr 1.92
20 Yr 2.25
30 Yr 2.39
Name: 2019-12-31 00:00:00, dtype: float64
1 Mo 2.44
3 Mo 2.45
6 Mo 2.56
1 Yr 2.63
2 Yr 2.48
3 Yr 2.46
5 Yr 2.51
7 Yr 2.59
10 Yr 2.69
20 Yr 2.87
30 Yr 3.02
Name: 2018-12-31 00:00:00, dtype: float64
One issue when working with data that is based on trading schedules is that we can run into an issue where a date we need is missing. For example, if we take the year prior version of a date, there is a chance that it could have been on a weekend the year before.
# Below is an example of how an error could be thrown
# Create the first date
date1 = datetime(2019, 12, 30)
# Find the date for the year prior
date2 = date1.replace(year=date1.year-1)
print(yield_curve.loc[date1])
print()
print()
print(yield_curve.loc[date2])
1 Mo 1.51
3 Mo 1.57
6 Mo 1.60
1 Yr 1.57
2 Yr 1.58
3 Yr 1.59
5 Yr 1.68
7 Yr 1.81
10 Yr 1.90
20 Yr 2.21
30 Yr 2.34
Name: 2019-12-30 00:00:00, dtype: float64
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/opt/anaconda3/envs/FinanceAndPython/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
KeyError: 1546128000000000000
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
~/opt/anaconda3/envs/FinanceAndPython/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3620 try:
-> 3621 return self._engine.get_loc(casted_key)
3622 except KeyError as err:
~/opt/anaconda3/envs/FinanceAndPython/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc()
~/opt/anaconda3/envs/FinanceAndPython/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc()
KeyError: Timestamp('2018-12-30 00:00:00')
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
~/opt/anaconda3/envs/FinanceAndPython/lib/python3.9/site-packages/pandas/core/indexes/datetimes.py in get_loc(self, key, method, tolerance)
678 try:
--> 679 return Index.get_loc(self, key, method, tolerance)
680 except KeyError as err:
~/opt/anaconda3/envs/FinanceAndPython/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3622 except KeyError as err:
-> 3623 raise KeyError(key) from err
3624 except TypeError:
KeyError: Timestamp('2018-12-30 00:00:00')
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
/var/folders/y0/fwkpk2ps087b_2qxvhjstrfr0000gn/T/ipykernel_78142/4089274187.py in <module>
10 print()
11 print()
---> 12 print(yield_curve.loc[date2])
13
~/opt/anaconda3/envs/FinanceAndPython/lib/python3.9/site-packages/pandas/core/indexing.py in __getitem__(self, key)
965
966 maybe_callable = com.apply_if_callable(key, self.obj)
--> 967 return self._getitem_axis(maybe_callable, axis=axis)
968
969 def _is_scalar_access(self, key: tuple):
~/opt/anaconda3/envs/FinanceAndPython/lib/python3.9/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
1200 # fall thru to straight lookup
1201 self._validate_key(key, axis)
-> 1202 return self._get_label(key, axis=axis)
1203
1204 def _get_slice_axis(self, slice_obj: slice, axis: int):
~/opt/anaconda3/envs/FinanceAndPython/lib/python3.9/site-packages/pandas/core/indexing.py in _get_label(self, label, axis)
1151 def _get_label(self, label, axis: int):
1152 # GH#5667 this will fail if the label is not present in the axis.
-> 1153 return self.obj.xs(label, axis=axis)
1154
1155 def _handle_lowerdim_multi_index_axis0(self, tup: tuple):
~/opt/anaconda3/envs/FinanceAndPython/lib/python3.9/site-packages/pandas/core/generic.py in xs(self, key, axis, level, drop_level)
3874 new_index = index[loc]
3875 else:
-> 3876 loc = index.get_loc(key)
3877
3878 if isinstance(loc, np.ndarray):
~/opt/anaconda3/envs/FinanceAndPython/lib/python3.9/site-packages/pandas/core/indexes/datetimes.py in get_loc(self, key, method, tolerance)
679 return Index.get_loc(self, key, method, tolerance)
680 except KeyError as err:
--> 681 raise KeyError(orig_key) from err
682
683 def _maybe_cast_for_get_loc(self, key) -> Timestamp:
KeyError: datetime.datetime(2018, 12, 30, 0, 0)
The way that we will approach this issue is the following:
- Create a daily date range between the minimum and maximum of the yield curve dates
- Reindex the yield curve data with the new range
- Fill in the null values
For the first step, we use pd.date_range which will return the daily dates between a start and end date
# Find the start and end of the yield_curve dates
date1 = yield_curve.index.min()
date2 = yield_curve.index.max()
date_range = pd.date_range(date1, date2)
print(date_range)
DatetimeIndex(['2001-07-31', '2001-08-01', '2001-08-02', '2001-08-03',
'2001-08-04', '2001-08-05', '2001-08-06', '2001-08-07',
'2001-08-08', '2001-08-09',
...
'2020-03-09', '2020-03-10', '2020-03-11', '2020-03-12',
'2020-03-13', '2020-03-14', '2020-03-15', '2020-03-16',
'2020-03-17', '2020-03-18'],
dtype='datetime64[ns]', length=6806, freq='D')
For the second step, we use reindex for the yield_curve and this will change the index to be what we pass in. If there are not values present for a certain index, they get filled in with null.
# Re-index the yield curve
yield_curve = yield_curve.reindex(index = date_range)
print(yield_curve)
1 Mo 3 Mo 6 Mo 1 Yr 2 Yr 3 Yr 5 Yr 7 Yr 10 Yr 20 Yr \
2001-07-31 3.67 3.54 3.47 3.53 3.79 4.06 4.57 4.86 5.07 5.61
2001-08-01 3.65 3.53 3.47 3.56 3.83 4.09 4.62 4.90 5.11 5.63
2001-08-02 3.65 3.53 3.46 3.57 3.89 4.17 4.69 4.97 5.17 5.68
2001-08-03 3.63 3.52 3.47 3.57 3.91 4.22 4.72 4.99 5.20 5.70
2001-08-04 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ...
2020-03-14 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2020-03-15 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2020-03-16 0.25 0.24 0.29 0.29 0.36 0.43 0.49 0.67 0.73 1.10
2020-03-17 0.12 0.19 0.24 0.30 0.47 0.54 0.66 0.91 1.02 1.45
2020-03-18 0.04 0.02 0.08 0.21 0.54 0.66 0.79 1.08 1.18 1.60
30 Yr
2001-07-31 5.51
2001-08-01 5.53
2001-08-02 5.57
2001-08-03 5.59
2001-08-04 NaN
... ...
2020-03-14 NaN
2020-03-15 NaN
2020-03-16 1.34
2020-03-17 1.63
2020-03-18 1.77
[6806 rows x 11 columns]
To fill in the null values, we are going to employ the fillna function and pass it the method of ffill which means to fill values forward. This will mean that the values of yesterday will be pushed forward to fill in dates with null values.
# Fill null values
yield_curve = yield_curve.fillna(method='ffill')
print(yield_curve)
1 Mo 3 Mo 6 Mo 1 Yr 2 Yr 3 Yr 5 Yr 7 Yr 10 Yr 20 Yr \
2001-07-31 3.67 3.54 3.47 3.53 3.79 4.06 4.57 4.86 5.07 5.61
2001-08-01 3.65 3.53 3.47 3.56 3.83 4.09 4.62 4.90 5.11 5.63
2001-08-02 3.65 3.53 3.46 3.57 3.89 4.17 4.69 4.97 5.17 5.68
2001-08-03 3.63 3.52 3.47 3.57 3.91 4.22 4.72 4.99 5.20 5.70
2001-08-04 3.63 3.52 3.47 3.57 3.91 4.22 4.72 4.99 5.20 5.70
... ... ... ... ... ... ... ... ... ... ...
2020-03-14 0.33 0.28 0.38 0.38 0.49 0.58 0.70 0.89 0.94 1.31
2020-03-15 0.33 0.28 0.38 0.38 0.49 0.58 0.70 0.89 0.94 1.31
2020-03-16 0.25 0.24 0.29 0.29 0.36 0.43 0.49 0.67 0.73 1.10
2020-03-17 0.12 0.19 0.24 0.30 0.47 0.54 0.66 0.91 1.02 1.45
2020-03-18 0.04 0.02 0.08 0.21 0.54 0.66 0.79 1.08 1.18 1.60
30 Yr
2001-07-31 5.51
2001-08-01 5.53
2001-08-02 5.57
2001-08-03 5.59
2001-08-04 5.59
... ...
2020-03-14 1.56
2020-03-15 1.56
2020-03-16 1.34
2020-03-17 1.63
2020-03-18 1.77
[6806 rows x 11 columns]