K-Means Clustering
With no labels at all, k-means groups similar data points into k clusters.
What you will learn
- Explain clustering and when to use it
- Run KMeans on unlabelled data
- Read which cluster each point joined
Finding groups without answers
So far every example came with a label. But often you have data with no answers and just want to find natural groups. That is clustering, a kind of unsupervised learning.
A shop might cluster customers into groups like “big spenders” and “bargain hunters” — without anyone labelling them first. The algorithm discovers the groups itself.
How k-means works, in plain words
- You choose k, the number of groups you want.
- It drops k centre points at random.
- Each data point joins its nearest centre.
- Each centre moves to the middle of its points.
- Repeat steps 3–4 until the centres stop moving.
A worked example: grouping customers
Each customer has two features — visits per month and money spent. We ask k-means for 2 groups.
from sklearn.cluster import KMeans
# [visits_per_month, money_spent]
X = [[1, 50], [2, 60], [1, 40], # light shoppers
[10, 500], [12, 550], [11, 520]] # heavy shoppers
model = KMeans(n_clusters=2, random_state=0, n_init=10)
model.fit(X)
print('Cluster labels:', list(model.labels_))
print('New customer [11, 530] ->', model.predict([[11, 530]])[0])Note: Output: Cluster labels: [1, 1, 1, 0, 0, 0] New customer [11, 530] -> 0 Without any labels, k-means split the customers into two groups: light shoppers (cluster 1) and heavy shoppers (cluster 0). The new big spender is placed in cluster 0. The group numbers are just names — 0 and 1 could be swapped.
Watch out: In clustering there is no “correct” label to check against. The numbers (0, 1, 2…) are arbitrary group ids, and you decide what each cluster means by looking at it.
Tip: You must pick k yourself. Too few clusters lumps different groups together; too many splits one real group apart. People often try several values and inspect the results.
Q. What makes k-means an unsupervised method?
✍️ Practice
- Change
n_clustersto 3 and print the newlabels_. How do the groups change? - Add a customer [2, 55] and predict which cluster they join.
🏠 Homework
- Describe a real situation (school, shop, music app) where clustering with no labels would be useful, and what the clusters might mean.