Unsupervised Learning›Extra· 40 min read

K-Means Clustering

With no labels at all, k-means groups similar data points into k clusters.

What you will learn

Explain clustering and when to use it
Run KMeans on unlabelled data
Read which cluster each point joined

Finding groups without answers

So far every example came with a label. But often you have data with no answers and just want to find natural groups. That is clustering, a kind of unsupervised learning.

A shop might cluster customers into groups like “big spenders” and “bargain hunters” — without anyone labelling them first. The algorithm discovers the groups itself.

How k-means works, in plain words

You choose k, the number of groups you want.
It drops k centre points at random.
Each data point joins its nearest centre.
Each centre moves to the middle of its points.
Repeat steps 3–4 until the centres stop moving.

A worked example: grouping customers

Each customer has two features — visits per month and money spent. We ask k-means for 2 groups.

k-means splitting customers into 2 groups (no labels given)

from sklearn.cluster import KMeans

# [visits_per_month, money_spent]
X = [[1, 50], [2, 60], [1, 40],        # light shoppers
     [10, 500], [12, 550], [11, 520]]  # heavy shoppers

model = KMeans(n_clusters=2, random_state=0, n_init=10)
model.fit(X)

print('Cluster labels:', list(model.labels_))
print('New customer [11, 530] ->', model.predict([[11, 530]])[0])

Note: Output: Cluster labels: [1, 1, 1, 0, 0, 0] New customer [11, 530] -> 0 Without any labels, k-means split the customers into two groups: light shoppers (cluster 1) and heavy shoppers (cluster 0). The new big spender is placed in cluster 0. The group numbers are just names — 0 and 1 could be swapped.

Watch out: In clustering there is no “correct” label to check against. The numbers (0, 1, 2…) are arbitrary group ids, and you decide what each cluster means by looking at it.

Tip: You must pick k yourself. Too few clusters lumps different groups together; too many splits one real group apart. People often try several values and inspect the results.

Q. What makes k-means an unsupervised method?

Answer: k-means works on data with no labels — it discovers natural clusters by itself, which is the heart of unsupervised learning.

✍️ Practice

Change n_clusters to 3 and print the new labels_. How do the groups change?
Add a customer [2, 55] and predict which cluster they join.

🏠 Homework

Describe a real situation (school, shop, music app) where clustering with no labels would be useful, and what the clusters might mean.