Introduction
KMeans¶
In this short course we are going to dive into some of the concepts surrounding the KMeans algorithm. This algorithm is a classic that you will learn very early on with machine learning which is meant to cluster together data. What this means is that given a set of data points, we can specify n clusters we would like, and this algorithm works to make those n clusters where the points within the clusters are as similar as possible and the clusters versus other clusters are as different as possible. The structure of this course is as follows:
- Intro: We dive right into the algorithm and see how it can be used to cluster for images by using a pre-built version of algorithm. This is meant to just be an overview to give you an idea of where we are headed. Other lessons will go much more in depth.
- Building the Algorithm: We take a step back and build the algorithm ourselves to enhance our knowledge.
- Visualizing the Algorithm: We use visualization to drive home to concepts of what exactly is happening in this clustering technique.
- Normalization: We learn the importance of normalization techniques as applied to clustering.
Now the first thing we have to worry about is simply reading in our data. In the github repository for this course, you can find and download the picture called Dogs.jpg. Once you have done that and put it in the folder you are writing your notebook with, we can read this image in with the help of plt.imread from matplotlib. This function will read a picture and return an 3D array corresponding to all the pixels.
import matplotlib.pyplot as plt
#Read the image in, notice the format
img = plt.imread('Dogs.jpg')
print(img)
#The shape is 3D because it is an array with RGB colors
print(img.shape)
With the picture loaded as a 3D array, we might want to also see how we can reverse this action. The plt.imshow function takes this data and returns back an actual picture. One thing to note here, we want to give a vmin/max which just says the maximum values assumed for the RGB pixels. This means that our values will be between 0 and 255, and to map them as such when using imshow.
#The imshow function with plot shows the image
#We use vmin and vmax as 0 and 255 because pixel's can be between 0-255 for each of the RGB values
plt.imshow(img, vmin=0,
vmax=255)
plt.show()
The reason I mentioned vmin/max is because we are going to standardize this number to be between 0 and 1 moving foward. Notice that when plotting now we want to use vmin=0, vmax=1.
#Let's standardize between 0-1 by dividing by 255
img = img / 255
#Now use vmin = 0 and vmax = 1
plt.imshow(img, vmin=0,
vmax=1)
plt.show()
Moving forward, we are going to want to build our KMeans algorithm by taking all of the RGB pixels (meaning the 3 values per pixel) and clustering them into groups of similar pixels. To do this, we are going to want to have 3 columns corresponding to each of the three colors, and then length*width number of rows corresponding to each pixel that we want to cluster. We will transform our data using reshape but hold on to the original image shape for when we want to translate it back into a 3D array for plotting.
#Hold the shape of the image in a variable
img_shape = img.shape
print(img_shape)
#Now let's reshape into a 2D array for feeding into our KMeans model
X = img.reshape(img_shape[0]*img_shape[1], img_shape[2])
print(X)
Notice that we have now a 2D array that we can train our KMeans model with. We if course can reshape our data back as well like so where we specify the three shape arguments we held onto....
reversed_img = X.reshape(img_shape[0],img_shape[1], img_shape[2])
print(reversed_img)
#Just as a check, we can see if the image is the same
plt.imshow(reversed_img, vmin=0,vmax=1)
plt.show()