Visualizing the Iterations
Visualizing the Iterations¶
Now let’s say that we are once again are starting from a random group of centroids. Let’s re-define a function for finding the groups since we now need to do this stuff ourself. We’ll check with a dummy point to make sure the output makes sense.
In [8]:
def find_groups(centroids, X):
#Get the distance
distance = [((X - c) ** 2).sum(axis=1) ** .5 for c in centroids]
#Stack the distances
distance = np.vstack(distance)
#Find the groups
groups = distance.argmin(axis=0)
return groups
#Define centroids
centroids = [[-10,20],
[15,7.5],[20,20]]
#Test a point out
print(find_groups(centroids, np.array([[-10.1, 20.1]])))
With that in mind we can find the full set of labels.
In [9]:
#Find all the labels in the grid
z = find_groups(centroids, np.vstack([x_grid.ravel(), y_grid.ravel()]).T)
z = z.reshape(x_grid.shape)
print(z)
In [10]:
#Set up our plots
fig, ax = plt.subplots()
#Plot all the actual data
A.plot.scatter(x='X1', y='X2', label='A', ax=ax, color='red')
B.plot.scatter(x='X1', y='X2', label='B', ax=ax, color='green')
C.plot.scatter(x='X1', y='X2', label='C', ax=ax, color='blue')
#Plot the regions
ax.imshow(z, interpolation='nearest',
extent=(-15, 35, -15, 35),
cmap=cmap,
alpha=.4,
aspect='auto', origin='lower')
plt.show()
Below are re-define a few functions we have been using up until now which we will utilize in visualization of our algorithm. As well I am going to summarize our plotting functionality in a new function as well.
In [11]:
def find_groups(centroids, X):
#Get the distance
distance = [((X - c) ** 2).sum(axis=1) ** .5 for c in centroids]
#Stack the distances
distance = np.vstack(distance)
#Find the groups
groups = distance.argmin(axis=0)
return groups
def compute_centroids(X, centroids, groups):
#Find the centers
centroids = [X[groups == l].mean(axis=0) for l in list(range(len(centroids)))]
#Stack
centroids = np.vstack(centroids)
return centroids
def plot_clusters(A, B, C, x_grid, y_grid, centroids):
#Find all the labels in the grid
z = find_groups(centroids, np.vstack([x_grid.ravel(), y_grid.ravel()]).T)
z = z.reshape(x_grid.shape)
#Set up our plots
fig, ax = plt.subplots()
#Plot all the actual data
A.plot.scatter(x='X1', y='X2', label='A', ax=ax, color='red')
B.plot.scatter(x='X1', y='X2', label='B', ax=ax, color='green')
C.plot.scatter(x='X1', y='X2', label='C', ax=ax, color='blue')
#Plot the regions
ax.imshow(z, interpolation='nearest',
extent=(-15, 35, -15, 35),
cmap=cmap,
alpha=.4,
aspect='auto', origin='lower')
plt.show()
And now we can look at how the iteratiosn shift over time to convergence.
In [12]:
max_iter = 1000
#Start with labels as -1 meaning null
labels = np.ones(len(X)) * -1
#Define starting centroids
centroids = [[-10,20],
[15,7.5],[20,20]]
plot_clusters(A, B, C, x_grid, y_grid, centroids)
num_iter = 0
for _ in range(max_iter):
num_iter += 1
#Hold onto the old labels
old_labels = labels.copy()
#Find labels and re-compute the centers
labels = find_groups(centroids, X)
centroids = compute_centroids(X, centroids, labels)
#Plot results
plot_clusters(A, B, C, x_grid, y_grid, centroids)
#If all labels are the same, end the iteration
if (labels == old_labels).all():
break
print("Converged after {} iterations.".format(num_iter))
If we try another one we can once again see the path it takes to eventually get to convergence.
In [13]:
max_iter = 1000
#Start with labels as -1 meaning null
labels = np.ones(len(X)) * -1
#Define starting centroids
centroids = [[-25,25],
[15,7.5],[30,40]]
plot_clusters(A, B, C, x_grid, y_grid, centroids)
num_iter = 0
for _ in range(max_iter):
num_iter += 1
#Hold onto the old labels
old_labels = labels.copy()
#Find labels and re-compute the centers
labels = find_groups(centroids, X)
centroids = compute_centroids(X, centroids, labels)
#Plot results
plot_clusters(A, B, C, x_grid, y_grid, centroids)
#If all labels are the same, end the iteration
if (labels == old_labels).all():
break
print("Converged after {} iterations.".format(num_iter))