Evaluate & Improve›Extra· 35 min read

Feature Scaling

Put features on the same scale so big-numbered columns do not dominate.

What you will learn

Explain why scale matters for some models
Scale features with StandardScaler
Know which models need scaling

When big numbers bully small ones

Imagine two features: age (around 20–60) and salary (around 20,000–80,000). To a distance-based model like KNN, salary’s huge numbers swamp age completely — age barely counts, just because its numbers are small.

Feature scaling rescales every column to a similar range, so each feature gets a fair say.

Standardisation with StandardScaler

The most common method, standardisation, shifts each column to have an average of 0 and a typical spread of 1. scikit-learn’s StandardScaler does it for you.

Standardising features to a comparable scale

from sklearn.preprocessing import StandardScaler

# [age, salary] — very different scales
X = [[25, 30000],
     [40, 60000],
     [60, 80000]]

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

for row in X_scaled:
    print([round(v, 2) for v in row])

Note: Output: [-1.16, -1.30] [-0.12, 0.16] [1.28, 1.14] Both columns now sit in a similar small range around 0. Age and salary can finally be compared fairly — no column dominates just because of big numbers.

Where those numbers come from

Standardisation is one tiny formula applied to every value: scaled = (value − average) ÷ spread (the “spread” is the standard deviation). Let us check the very first age, 25, by hand. The three ages are 25, 40, 60, so their average is (25 + 40 + 60) ÷ 3 ≈ 41.67, and their spread works out to about 14.34.

Standardising the value 25 by hand: (value − average) ÷ spread

ages = [25, 40, 60]
avg = sum(ages) / len(ages)                       # the mean
spread = (sum((a - avg)**2 for a in ages) / len(ages)) ** 0.5

print('average:', round(avg, 2))
print('spread: ', round(spread, 2))
print('scaled age 25:', round((25 - avg) / spread, 2))

Note: Output: average: 41.67 spread: 14.34 scaled age 25: -1.16 That -1.16 is exactly the first number StandardScaler produced above. A scaled value just says “how many spreads below or above average am I?” — and that scale is the same for every column, so no column can bully the others.

Which models need scaling?

Needs scaling	Does NOT need scaling
KNN (uses distances)	Decision trees
PCA	Random forests
Models using gradient descent	Other tree-based models

Gradient descent (in the table above) is just the step-by-step way many models learn: they start with a rough guess and nudge it a little at a time to reduce their mistakes, like walking downhill to the lowest point. These models also compare features by their numbers, so — like KNN — they work best when every column is on the same scale.

Watch out: Fit the scaler on the training data only, then apply it to the test data. Fitting it on everything leaks information from the test set into training — a subtle but real mistake.

Tip: Tree-based models (decision tree, random forest) split one feature at a time, so they do not care about scale. Distance- and gradient-based models (KNN, PCA, neural nets) almost always do.

Q. Which model most needs its features scaled before training?

Answer: KNN relies on distances between points, so a large-scale feature would dominate. Tree-based models are unaffected by scale.

✍️ Practice

Scale the data with MinMaxScaler instead and compare the output range.
Explain why a decision tree does not need scaling but KNN does.

🏠 Homework

Take any 2-feature dataset where the columns have very different ranges, scale it with StandardScaler, and note how the values change.