-
Basic Functionality 6
-
Lecture1.1
-
Lecture1.2
-
Lecture1.3
-
Lecture1.4
-
Lecture1.5
-
Lecture1.6
-
-
Pandas & Indices 4
-
Lecture2.1
-
Lecture2.2
-
Lecture2.3
-
Lecture2.4
-
-
Joins, Unions & Order By 3
-
Lecture3.1
-
Lecture3.2
-
Lecture3.3
-
-
Group By & Views 3
-
Lecture4.1
-
Lecture4.2
-
Lecture4.3
-
Updating the Model
For this final part of lesson 2, we are going to see how we could add in functionality to allow for updating the tables with new data based on the models. First, we would need to find the maximum date in our table to know where we are starting from.
#Find the maximum date
query = 'SELECT MAX(Date) FROM Curves'
print(cur.execute(query).fetchall())
The date is returned as a string within a nested list, so let’s convert it to date by grabbing the first element of the first element (because it is a nested list) and converting to datetime with pd.to_datetime().
#Convert to datetime
query = 'SELECT MAX(Date) FROM Curves'
latest_date = pd.to_datetime(cur.execute(query).fetchall()[0][0])
print(latest_date)
It is easy to add a day to this amount if we use pd.Timedelta(), like so:
#Remember that you can use pd.Timedelta to add to dates
print(latest_date+pd.Timedelta("1D"))
Now, let’s start with the function we want to build. First create the functiona as though it would just return the next day, and the day N days after.
#First create the functiona as though it would just return the next day, and the day N days after
def update_model(date, n_days):
start_date = date + pd.Timedelta("1D")
end_date = date + pd.Timedelta("{}D".format(n_days))
print(start_date, end_date)
update_model(latest_date, 5)
Change the function so that we transform the two dates into a range of dates with pd.date_range()
#Change the function to return the date range
def update_model(date, n_days):
start_date = date + pd.Timedelta("1D")
end_date = date + pd.Timedelta("{}D".format(n_days))
dates = pd.date_range(start_date,end_date)
base = pd.Series(range(len(dates)), index=dates)
print(base)
update_model(latest_date, 5)
Let’s also review the idea of differences in datetimes. Subtracting two dates returns the time between the two.
#Recall what subtracting two dates looks like
start_date = datetime(2019,1,1)
current_date = datetime(2020,1,1)
delta = current_date - start_date
print(delta)
If you use the attribute days, you’ll be able to snag the number of days involved in the difference of times.
#You can access the days by using the days attribute
print(delta.days)
This same sort of logic works on date ranges as well. For example, the following date ranges can be converted to measure the number of days since a starting date.
#You can expand this to finding the difference between a date range a given days
date_range = pd.date_range(datetime(2020,1,1),datetime(2020,1,5))
#This gives the difference in time
print(date_range - datetime(2019,1,1))
print()
print()
#And this takes the integer number of days difference in time
print((date_range - datetime(2019,1,1)).days)
Now, update our function to have the values be equal to the number of days since the start date of January 1, 2019.
#Update the model to have values equal to the number of days since the start date
def update_model(date, n_days):
start_date = date + pd.Timedelta("1D")
end_date = date + pd.Timedelta("{}D".format(n_days))
dates = pd.date_range(start_date,end_date)
base_values = (pd.date_range(start_date,end_date) - datetime(2019,1,1)).days
base = pd.Series(base_values, index=dates)
print(base)
update_model(latest_date, 5)
And with this, we can add on the prior code which made the population model. Once again, you don’t need to worry about the specifics.
#Add in the population models
def update_model(date, n_days):
start_date = date + pd.Timedelta("1D")
end_date = date + pd.Timedelta("{}D".format(n_days))
dates = pd.date_range(start_date,end_date)
base_values = (pd.date_range(start_date,end_date) - datetime(2019,1,1)).days
base = pd.Series(base_values, index=dates)
linear1 = 1 + base/5
linear2 = 1 + base/2.5
exp1 = np.exp(base/100)
exp2 = np.exp(base/70)
log1 = 20*np.log(base+1) + 1
log2 = 20*np.log(base*2+1) + 1
data = pd.concat([linear1, linear2, exp1, exp2, log1, log2], axis=1)
data.columns = ["Linear1", "Linear2", "Exponential1", "Exponential2",
"Logarithmic1", "Logarithmic2"]
data.index.name = 'Date'
return data
new_data = update_model(latest_date, 5)
print(new_data)
Append the data on the table.
#Append the data
new_data.to_sql("Curves", conn, if_exists='append')
And finally, check to make sure the data is in.
query = 'SELECT * FROM Curves;'
df = pd.read_sql(query, conn)
df.tail(10)