-
Introduction 2
-
Lecture1.1
-
Lecture1.2
-
-
Building the Algorithm 4
-
Lecture2.1
-
Lecture2.2
-
Lecture2.3
-
Lecture2.4
-
-
Visualizing the Algorithm 2
-
Lecture3.1
-
Lecture3.2
-
-
Normalization 2
-
Lecture4.1
-
Lecture4.2
-
Initialize
KMeans Algorithm¶
We have seen how we can use the pre-built algorithm, but we want to truly understand what is happening and so for that we are going to build our own basic version! Now first the broad overview of how it works…. for each step we will give more explanation as we get to that part in the code, but overall, you can do KMeans the following way.
-
Initialize: Start with n initial centroids, these can be created in numerous different ways, but for our basic example we will pick random colors.
-
Assign: For each data point, find the closest centroid, generally speaking by using basic euclidean distance (although that does not mean you can’t substitute in a different metric).
-
Update: For each group present in the data, go through and update the centroid to be the mean of cluster points. You will need to save the old labels prior to running this.
-
Check: Check if you have either done a specified number of maximum iterations or if the labels did not change from their old labels by comparing the two. If either of these conditions are true then stop the algorithm, otherwise go back to step 2.
Step 1: Initialize¶
In the first code block, read in the picture again that we will be using. In the second one, we are picking 4 colors (each has 3 values one for each of Red/Green/Blue) from a random uniform distribution between 0 and 1. We will also show the palette of colors we start with.
import matplotlib.pyplot as plt
#Read in the image
img = plt.imread('Dogs.jpg')
img = img / 255
plt.imshow(img, vmin=0,
vmax=1)
plt.show()
import numpy as np
#Set the seed to make it easy to replicate
np.random.seed(0)
#Randomly choose 4 colors
colors = np.random.uniform(0,1,(4,3))
#Show the ten colors
plt.imshow([colors])
plt.show()
As we did prior, we are going to reshape the array so that we can use the algorithm.
#Hold onto the old shape
img_shape = img.shape
#Reshape the img
X = img.reshape(img_shape[0]*img_shape[1], img_shape[2])
print(X)
[[0.00392157 0.00392157 0. ]
[0.00392157 0.00392157 0. ]
[0. 0. 0. ]
...
[0.33333333 0.31764706 0.30588235]
[0.00784314 0.00392157 0. ]
[0.01176471 0.01176471 0.00392157]]