Feature Scaling
Put features on the same scale so big-numbered columns do not dominate.
What you will learn
- Explain why scale matters for some models
- Scale features with StandardScaler
- Know which models need scaling
When big numbers bully small ones
Imagine two features: age (around 20–60) and salary (around 20,000–80,000). To a distance-based model like KNN, salary’s huge numbers swamp age completely — age barely counts, just because its numbers are small.
Feature scaling rescales every column to a similar range, so each feature gets a fair say.
Standardisation with StandardScaler
The most common method, standardisation, shifts each column to have an average of 0 and a typical spread of 1. scikit-learn’s StandardScaler does it for you.
from sklearn.preprocessing import StandardScaler
# [age, salary] — very different scales
X = [[25, 30000],
[40, 60000],
[60, 80000]]
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
for row in X_scaled:
print([round(v, 2) for v in row])Note: Output: [-1.16, -1.30] [-0.12, 0.16] [1.28, 1.14] Both columns now sit in a similar small range around 0. Age and salary can finally be compared fairly — no column dominates just because of big numbers.
Where those numbers come from
Standardisation is one tiny formula applied to every value: scaled = (value − average) ÷ spread (the “spread” is the standard deviation). Let us check the very first age, 25, by hand. The three ages are 25, 40, 60, so their average is (25 + 40 + 60) ÷ 3 ≈ 41.67, and their spread works out to about 14.34.
ages = [25, 40, 60]
avg = sum(ages) / len(ages) # the mean
spread = (sum((a - avg)**2 for a in ages) / len(ages)) ** 0.5
print('average:', round(avg, 2))
print('spread: ', round(spread, 2))
print('scaled age 25:', round((25 - avg) / spread, 2))Note: Output: average: 41.67 spread: 14.34 scaled age 25: -1.16 That -1.16 is exactly the first number StandardScaler produced above. A scaled value just says “how many spreads below or above average am I?” — and that scale is the same for every column, so no column can bully the others.
Which models need scaling?
| Needs scaling | Does NOT need scaling |
|---|---|
| KNN (uses distances) | Decision trees |
| PCA | Random forests |
| Models using gradient descent | Other tree-based models |
Gradient descent (in the table above) is just the step-by-step way many models learn: they start with a rough guess and nudge it a little at a time to reduce their mistakes, like walking downhill to the lowest point. These models also compare features by their numbers, so — like KNN — they work best when every column is on the same scale.
Watch out: Fit the scaler on the training data only, then apply it to the test data. Fitting it on everything leaks information from the test set into training — a subtle but real mistake.
Tip: Tree-based models (decision tree, random forest) split one feature at a time, so they do not care about scale. Distance- and gradient-based models (KNN, PCA, neural nets) almost always do.
Q. Which model most needs its features scaled before training?
✍️ Practice
- Scale the data with
MinMaxScalerinstead and compare the output range. - Explain why a decision tree does not need scaling but KNN does.
🏠 Homework
- Take any 2-feature dataset where the columns have very different ranges, scale it with
StandardScaler, and note how the values change.