Updating the Dataframe Part 2

SP500.index = SP500["Ticker"]
SP500 = pd.concat([SP500,df],axis=1,join="inner")
print(SP500)

We use join=”inner” to specify we want to only have tickers in both datasets present, this is because a few were dropped out in the dataset downloading process (not every single stock is present in quandl). Now, the way we can see all unique values is through the unique() function. Let’s see what industries we are working with.

print(SP500["GICS Sector"].unique())

We can also get the count for each unique value this way.

print(SP500["GICS Sector"].value_counts())

Now, what we want to do is find the count for each group in each of our industry categories.

print(SP500[SP500["GICS Sector"]=="Financials"]["Group"].value_counts())

This line gets a dataframe of only stocks in the Financials industry, and then returns the column “Group”‘s value counts.

We only have 3/8 groups present, which is not surprising, stocks in the same indsutry tend to act in similar manners. Let’s get all the industries.

for x in SP500["GICS Sector"].unique():
    print(SP500[SP500["GICS Sector"]==x]["Group"].value_counts())

Let’s find the industries for each group now. We will do this one by using groupby(). This functions creates groupings based upon whatever column name we feed it, and from there we can do things like find the value counts. It is much easier than what we did before.

print(SP500.groupby("Group")["GICS Sector"].value_counts())

Data Science

Clustering Stock Industries

Updating the Dataframe Part 2

Source Code

Leave A Reply Cancel reply

Modal title