Visualizing the Iterations

Visualizing the Iterations¶

Now let’s say that we are once again are starting from a random group of centroids. Let’s re-define a function for finding the groups since we now need to do this stuff ourself. We’ll check with a dummy point to make sure the output makes sense.

In [8]:

def find_groups(centroids, X):
    #Get the distance
    distance = [((X - c) ** 2).sum(axis=1) ** .5 for c in centroids]

    #Stack the distances
    distance = np.vstack(distance)

    #Find the groups
    groups = distance.argmin(axis=0)

    return groups

#Define centroids
centroids = [[-10,20],
    [15,7.5],[20,20]]

#Test a point out
print(find_groups(centroids, np.array([[-10.1, 20.1]])))

[0]

With that in mind we can find the full set of labels.

In [9]:

#Find all the labels in the grid
z = find_groups(centroids, np.vstack([x_grid.ravel(), y_grid.ravel()]).T)
z = z.reshape(x_grid.shape)
print(z)

[[0 0 0 ... 1 1 1]
 [0 0 0 ... 1 1 1]
 [0 0 0 ... 1 1 1]
 ...
 [0 0 0 ... 2 2 2]
 [0 0 0 ... 2 2 2]
 [0 0 0 ... 2 2 2]]

In [10]:

#Set up our plots
fig, ax = plt.subplots()

#Plot all the actual data
A.plot.scatter(x='X1', y='X2', label='A', ax=ax, color='red')
B.plot.scatter(x='X1', y='X2', label='B', ax=ax, color='green')
C.plot.scatter(x='X1', y='X2', label='C', ax=ax, color='blue')

#Plot the regions
ax.imshow(z, interpolation='nearest',
           extent=(-15, 35, -15, 35),
           cmap=cmap,
           alpha=.4,
           aspect='auto', origin='lower')

plt.show()

Below are re-define a few functions we have been using up until now which we will utilize in visualization of our algorithm. As well I am going to summarize our plotting functionality in a new function as well.

In [11]:

def find_groups(centroids, X):
    #Get the distance
    distance = [((X - c) ** 2).sum(axis=1) ** .5 for c in centroids]

    #Stack the distances
    distance = np.vstack(distance)

    #Find the groups
    groups = distance.argmin(axis=0)

    return groups

def compute_centroids(X, centroids, groups):
    #Find the centers
    centroids = [X[groups == l].mean(axis=0) for l in list(range(len(centroids)))]

    #Stack
    centroids = np.vstack(centroids)

    return centroids

def plot_clusters(A, B, C, x_grid, y_grid, centroids):
    #Find all the labels in the grid
    z = find_groups(centroids, np.vstack([x_grid.ravel(), y_grid.ravel()]).T)
    z = z.reshape(x_grid.shape)

    #Set up our plots
    fig, ax = plt.subplots()

    #Plot all the actual data
    A.plot.scatter(x='X1', y='X2', label='A', ax=ax, color='red')
    B.plot.scatter(x='X1', y='X2', label='B', ax=ax, color='green')
    C.plot.scatter(x='X1', y='X2', label='C', ax=ax, color='blue')

    #Plot the regions
    ax.imshow(z, interpolation='nearest',
               extent=(-15, 35, -15, 35),
               cmap=cmap,
               alpha=.4,
               aspect='auto', origin='lower')

    plt.show()

And now we can look at how the iteratiosn shift over time to convergence.

In [12]:

max_iter = 1000

#Start with labels as -1 meaning null
labels = np.ones(len(X)) * -1

#Define starting centroids
centroids = [[-10,20],
    [15,7.5],[20,20]]

plot_clusters(A, B, C, x_grid, y_grid, centroids)
num_iter = 0
for _ in range(max_iter):
    num_iter += 1

    #Hold onto the old labels
    old_labels = labels.copy()

    #Find labels and re-compute the centers
    labels = find_groups(centroids, X)
    centroids = compute_centroids(X, centroids, labels)

    #Plot results
    plot_clusters(A, B, C, x_grid, y_grid, centroids)

    #If all labels are the same, end the iteration
    if (labels == old_labels).all():
        break

print("Converged after {} iterations.".format(num_iter))

Converged after 4 iterations.

If we try another one we can once again see the path it takes to eventually get to convergence.

In [13]:

max_iter = 1000

#Start with labels as -1 meaning null
labels = np.ones(len(X)) * -1

#Define starting centroids
centroids = [[-25,25],
    [15,7.5],[30,40]]

plot_clusters(A, B, C, x_grid, y_grid, centroids)
num_iter = 0
for _ in range(max_iter):
    num_iter += 1

    #Hold onto the old labels
    old_labels = labels.copy()

    #Find labels and re-compute the centers
    labels = find_groups(centroids, X)
    centroids = compute_centroids(X, centroids, labels)

    #Plot results
    plot_clusters(A, B, C, x_grid, y_grid, centroids)

    #If all labels are the same, end the iteration
    if (labels == old_labels).all():
        break

print("Converged after {} iterations.".format(num_iter))

Converged after 6 iterations.

Data Science

KMeans

Visualizing the Iterations

Visualizing the Iterations¶

Leave A Reply Cancel reply

Modal title