-
Basic Functionality 6
-
Lecture1.1
-
Lecture1.2
-
Lecture1.3
-
Lecture1.4
-
Lecture1.5
-
Lecture1.6
-
-
Pandas & Indices 4
-
Lecture2.1
-
Lecture2.2
-
Lecture2.3
-
Lecture2.4
-
-
Joins, Unions & Order By 3
-
Lecture3.1
-
Lecture3.2
-
Lecture3.3
-
-
Group By & Views 3
-
Lecture4.1
-
Lecture4.2
-
Lecture4.3
-
Pandas to SQL
Now we are going to learn how we are able to use pandas and sql together. First, we are going to set up the data, do not worry about the actual code here. The data here just represents growth given a few different equations.
import pandas as pd
import numpy as np
from datetime import datetime
import matplotlib.pyplot as plt
#Create a dataset for use in our lesson, don't worry about the details on this
dates = pd.date_range(datetime(2019,1,1),datetime(2019,12,31))
base = pd.Series(range(len(dates)), index=dates)
linear1 = 1 + base/5
linear2 = 1 + base/2.5
exp1 = np.exp(base/100)
exp2 = np.exp(base/70)
log1 = 20*np.log(base+1) + 1
log2 = 20*np.log(base*2+1) + 1
data = pd.concat([linear1, linear2, exp1, exp2, log1, log2], axis=1)
data.columns = ["Linear1", "Linear2", "Exponential1", "Exponential2",
"Logarithmic1", "Logarithmic2"]
data.index.name = 'Date'
data.plot(kind='line')
plt.ylabel("Population")
plt.title("Population Growth Models")
plt.show()
As we have done before, we can check if the database has already been created and if so delete it. Then, the dataframe function to_sql() will take a table name and a connection object. This function will write the pandas dataframe to the sql database.
import sqlite3
import os
#First make sure if the database is already there that we erase it prior to this lesson
if os.path.exists("Population.db"):
os.remove("Population.db")
#Create the database
conn = sqlite3.connect('Population.db')
#To sql on a pndas dataframe takes a table name and connection then pushes this data to sql
data.to_sql("Curves", conn)
Let’s see what the data looks like, if we pull from the database.
#We can query this data and transform into a pandas dataframe like so
#Notice that the index
query = 'SELECT * FROM Curves;'
df = pd.read_sql(query, conn)
print(df.tail(5))
Similar to the first lesson, we can’t just write over the table normally. Notice how the following will fail.
#You can't do this if the table already exists normally
data.to_sql("Curves", conn)
However, it is possible to overwrite the table with this function if we change the argument if_exists to equal ‘replace’.
#But if, for example, we wanted to replace with only the first 6 months of data, we can change the
#if_exists argument to be 'replace'
data.loc[:"2019-06-30"].to_sql("Curves", conn, if_exists='replace')
query = 'SELECT * FROM Curves;'
df = pd.read_sql(query, conn)
print(df.tail(5))
There is also the option to change the argument to append which will add the data on to the table.
#Changing the argument to append allows for adding on to the latest
data.loc["2019-07-01":].to_sql("Curves", conn, if_exists='append')
query = 'SELECT * FROM Curves;'
df = pd.read_sql(query, conn)
print(df.head(5))
print()
print(df.tail(5))