Regression
Solution
def regress(stock):
result = sm.ols(formula=stock+" ~ Index + Oil + Gold + NaturalGas", data=df).fit()
print(result.summary())
regress("F")
Now, I’m going to change up the function to print the pvalues of our results.
def regress(stock):
result = sm.ols(formula=stock+" ~ Index + Oil + Gold + NaturalGas", data=df).fit()
print(result.pvalues)
regress("AAPL")
These values are important, they tell us what we believe is significant. If we don’t have a significant value, then the value is expected to actually be 0, or in other words have no effect holding all else constant. We would, for example, expect that gold prices effect all stocks, but when we control for the market we notice that the reason gold is correlated with all these stocks is because it is also correlated with the market! This gives us better precision since we see which firms really get effected by gold versus which just get effected by the overall market enviroment.
Let’s change the formula again, now we are going to get True/False values for whether or not a p-vale is below 5%
def regress(stock):
result = sm.ols(formula=stock+" ~ Index + Oil + Gold + NaturalGas", data=df).fit()
print(result.pvalues<.05)
regress("PXD")
We get true for the index and oil prices for PXD since there p-values are both under 5%
We can also access the betas with the params attribute, so let’s print those out too.
def regress(stock):
result = sm.ols(formula=stock+" ~ Index + Oil + Gold + NaturalGas", data=df).fit()
print(result.pvalues<.05)
print(result.params)
regress("PXD")
Now, for the cool part. If we have something like the first print out, a series of True/False values, we can use it to index. In pandas, if we give a True value in the index then we keep the value, if we give a false value then we exclude it.
def regress(stock):
result = sm.ols(formula=stock+" ~ Index + Oil + Gold + NaturalGas", data=df).fit()
truthSeries = result.pvalues<.05
print(result.params[truthSeries])
regress("PXD")
Now we only get back the index beta and the oil beta!
Challenge