-
Introduction 1
-
Lecture1.1
-
-
Getting the Data 3
-
Lecture2.1
-
Lecture2.2
-
Lecture2.3
-
-
SP500 Webscrape 4
-
Lecture3.1
-
Lecture3.2
-
Lecture3.3
-
Lecture3.4
-
-
Full Dataset 2
-
Lecture4.1
-
Lecture4.2
-
-
Regressions 5
-
Lecture5.1
-
Lecture5.2
-
Lecture5.3
-
Lecture5.4
-
Lecture5.5
-
-
Machine Learning 5
-
Lecture6.1
-
Lecture6.2
-
Lecture6.3
-
Lecture6.4
-
Lecture6.5
-
-
Machine Learning Function 2
-
Lecture7.1
-
Lecture7.2
-
-
Visualize Data 2
-
Lecture8.1
-
Lecture8.2
-
Getting Stock Data
The first thing we are going to need to do is install pandas-datareader
!pip install pandas-datareader
The way pandas-datareader works is we give a start date, end date, what website we will download from, and the stock or stocks we want. We define dates with the date time library, and feed it the arguments year, month and then day.
import datetime
startDate = datetime.datetime(2010, 1, 1)
endDate = datetime.datetime(2017, 5, 1)
Now let’s download some Ford stock data. The first line is going to import the data reader. The second line takes the stock as the first argument, followed by the start and end date.
import pandas_datareader as pdr
stock = pdr.get_data_yahoo("F", startDate, endDate)
The stock object is a pandas dataframe, we can print it out to see what it looks like.
print(stock)
The first column is our index column. In pandas dataframes, we have an index column that acts as the identifier for each row. It could be numbers like 0,1,2… but in this case our index is the date, which acts as the index row for each day of the stock.
The other columns describe the stock on that day, all our stock terms. High is the high of the day, low is the low, open is what it started at, close is what it ended at, adj close is the closing price adjusted for certain corporate actions and volume is how much was traded that day.
In pandas, we can also get a certain column by indexing like below. Just pass in the column name.
print(stock["Adj Close"])
We can also index multiple columns by feeding an array in, like below.
print(stock[["Close","Open"]])
There are also functions we can apply to the data frame. For example, we get the percent change over the previous row as so.
print(stock.pct_change())
Now, let’s create a new dataframe by getting the percent change for the close.
df = stock.pct_change()["Adj Close"]
print(df)
The final step is getting rid of that first column. It’s NaN because there is no previous data before that. The way we will do this is by using a function to drop any NaN values. Generally, this will return a dataframe with all the NaN values, but not mutate the dataframe. To mutate the dataframe (we could also set df equal to it instead) we add an argument “inplace=True”.
df.dropna(inplace=True)