Supervised Learning›Core· 35 min read

K-Nearest Neighbours (KNN)

To classify a new point, look at its closest neighbours and copy the majority.

What you will learn

Explain the KNN idea
Train KNeighborsClassifier
See how k changes the result

Judge a point by its neighbours

K-nearest neighbours (KNN) is the most intuitive classifier of all. To label a new point, it finds the k closest examples it has seen and takes a majority vote.

It is the old saying “you are the company you keep.” If a new fruit is closest to three apples and one orange, KNN calls it an apple.

A worked example: fruit by size and weight

Each fruit has two features — width and weight — and a label. We will classify a mystery fruit by its nearest neighbours.

KNN classifies a fruit by its 3 nearest neighbours

from sklearn.neighbors import KNeighborsClassifier

# Features: [width_cm, weight_g]
X = [[7, 150], [7, 170], [6, 140],     # apples
     [8, 110], [9, 120], [8, 100]]     # oranges
y = ['apple','apple','apple',
     'orange','orange','orange']

model = KNeighborsClassifier(n_neighbors=3)   # look at 3 neighbours
model.fit(X, y)

mystery = [[7, 160]]
print('Mystery fruit ->', model.predict(mystery)[0])

Note: Output: Mystery fruit -> apple The mystery fruit (7 cm, 160 g) is closest to the three apples, which are heavier. Three out of three nearest neighbours are apples, so KNN votes “apple”.

What “closest” actually means

KNN measures distance between points, just like distance on a map. For two features it is the straight-line (Pythagoras) distance. Let us compute how far our mystery fruit [7, 160] is from one apple [7, 150] and one orange [8, 110] so you can see why it picks apple.

Computing straight-line distance from the mystery fruit to two known fruits

mystery = [7, 160]

def distance(a, b):
    return ((a[0]-b[0])**2 + (a[1]-b[1])**2) ** 0.5

print('to apple [7,150]: ', round(distance(mystery, [7, 150]), 1))
print('to orange [8,110]:', round(distance(mystery, [8, 110]), 1))

Note: Output: to apple [7,150]: 10.0 to orange [8,110]: 50.0 The apple is only 10 away but the orange is 50 away, so the apple is a much nearer neighbour. KNN does this distance sum for every known point, keeps the k smallest, and votes. (Notice weight dominated here — that is the scaling problem in the warning below.)

Choosing k

The k in KNN is how many neighbours vote. It changes the answer. (In the table, an outlier means a stray, unusual data point that sits far from the rest — like one giant apple among normal ones; with k = 1 a single outlier can swing the vote.)

k value	Behaviour	Risk
Small (k = 1)	Very sensitive, follows every point	Reacts to noise / outliers
Large (k = 15)	Very smooth, averages many points	Can blur real boundaries
Medium (k = 3–7)	Usually a good balance	Try a few and test

Watch out: KNN compares distances, so features on bigger scales (like weight 100–170) can drown out smaller ones (like width 6–9). Feature scaling fixes this — a topic in the last unit.

Tip: KNN does almost no work when training — it just stores the data. The effort happens at predict time, when it measures distances. That makes it simple but slow on very large datasets.

Q. How does KNN decide the label of a new point?

Answer: KNN finds the k nearest examples and uses their majority label (for classification) to label the new point.

✍️ Practice

Change n_neighbors to 1 and re-predict the mystery fruit. Does the answer change?
Add a mystery fruit [9, 115] and predict it — is it an apple or orange?

🏠 Homework

Explain in your own words why a very large k could make KNN ignore a small but real group in the data.