-
Introduction 1
-
Lecture1.1
-
-
Getting the Data 3
-
Lecture2.1
-
Lecture2.2
-
Lecture2.3
-
-
SP500 Webscrape 4
-
Lecture3.1
-
Lecture3.2
-
Lecture3.3
-
Lecture3.4
-
-
Full Dataset 2
-
Lecture4.1
-
Lecture4.2
-
-
Regressions 5
-
Lecture5.1
-
Lecture5.2
-
Lecture5.3
-
Lecture5.4
-
Lecture5.5
-
-
Machine Learning 5
-
Lecture6.1
-
Lecture6.2
-
Lecture6.3
-
Lecture6.4
-
Lecture6.5
-
-
Machine Learning Function 2
-
Lecture7.1
-
Lecture7.2
-
-
Visualize Data 2
-
Lecture8.1
-
Lecture8.2
-
Preparing the Data
We are going to import the file we used last time, but we also want to fill in the NaN values. Using fillna() we can decide what we want the NaN values to be, in our case it will be 0.
import pandas as pd
df = pd.DataFrame.from_csv("RegressionMatrix1.csv",encoding="UTF-8")
df = df.fillna(0)
print(df)
We also want a different orientation where the beta indicators are the columns. For that, we need to transpose the dataframe.
df = df.transpose()
print(df)
Now, we need to deal with the data. What we are going to use is the z-score for each value. What the z-score is, is a measure that says how many standard deviations a value is away from the mean. So we want to get the value minus the mean, and divide that by the column standard deviation.
Challenge
There is a std() function in pandas that can return the column standard deviation, try to get the z-score for each value. This might be tough so don’t spend too much time, you’ll see it’s easy once you know how to do it. Also, you should use std(ddof=0) to set delta degrees of freedom equal to 0.
Prev
Introduction
Next
Machine Learning