1

K-means may give different results, because the initial choice of centroids is random.

However, if I were to choose k=1, will the algorithm always provide the same answer equal to the "barycentre" of my data?

user
  • 1,993
  • 6
  • 21
  • 38

1 Answers1

5

Yes. The centroid will converge to the center of all your data and this will occur in a single iteration. This is due to all the data points belonging to a single centroid, thus it will be centered according to all these instances immediately.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans

# Generate data
class1 = np.random.randn(1000, 2)

class2 = np.random.randn(1000, 2)
class2[:,0] = 4+class2[:,0]
class2[:,1] = 4+class2[:,1]

class3 = np.random.randn(1000, 2)
class3[:,0] = -4+class3[:,0]
class3[:,1] = 4+class3[:,1]

data = np.append( class1, class2, axis= 0)
data = np.append( data, class3, axis= 0)
print(data.shape)

# Plot the data
plt.scatter(data[:,0], data[:,1])
plt.show()

# Cluster
kmeans = KMeans(n_clusters=1, random_state=0, verbose = 1).fit(data)

# Plot clustered results
plt.scatter(data[:,0], data[:,1], c=kmeans.labels_)
plt.scatter(kmeans.cluster_centers_[:,0], kmeans.cluster_centers_[:,1], c = 'r')
plt.show()

# Show the cluster centers
print(kmeans.cluster_centers_)

enter image description here

Initialization complete Iteration 0, inertia 81470.055 Iteration 1, inertia 48841.695 Converged at iteration 1: center shift 0.000000e+00 within tolerance 8.140283e-04

JahKnows
  • 8,866
  • 30
  • 45