Machine Learning
Solution
for x in df.columns:
df[x] = (df[x] - df[x].mean())/df[x].std(ddof=0)
print(df)
Now, let’s download our machine learning library. We are going to use KMeans for clustering.
from sklearn.cluster import KMeans
The way we make a model is by writing KMeans(clusters=n) where n is the number of groups or clusters we want to have. Let’s make 8 clusters.
model = KMeans(n_clusters=8)
The way this model works is it creates 8 groups where they are as similar as possible to each other in their groups, and as different as possible to whatever stocks are not in their group.
model = model.fit(df)
Put this line in after.
print(model.cluster_centers_)
Cluster centers represent where each group has their central point which we measure distance from. So if, for example, the cluster center was equal to a stock’s betas, the distance would be 0. This is only a quick look at how clustering works, it becomes an optimization problem involving distances. We don’t need to know the cluster centers for this project though.
We can see which group each stock has been sorted into by doing this below line of code.
print(model.labels_)
Challenge