Unsupervised Learning›Pro· 35 min read

PCA: Dimensionality Reduction

Squeeze many columns down to a few while keeping most of the information.

What you will learn

Explain why fewer features can help
Run PCA to reduce dimensions
Read how much information is kept

Too many columns is a problem

Real datasets can have dozens or hundreds of features. That is hard to plot, slow to train, and many columns overlap. PCA (Principal Component Analysis) shrinks many features down to a few, while keeping most of the information.

Think of a shadow. A 3D object casts a 2D shadow on the wall. You lose some detail, but a good shadow still shows the shape clearly. PCA finds the best “angle” so the flattened data keeps as much shape as possible.

A worked example: 4 features down to 2

The classic iris flower dataset has 4 measurements per flower. We squeeze them to 2 so they can be plotted.

Reducing 4 features to 2 with PCA

from sklearn.datasets import load_iris
from sklearn.decomposition import PCA

X = load_iris().data        # 150 flowers, 4 features each
print('Before PCA:', X.shape)

pca = PCA(n_components=2)    # keep just 2 combined features
X2 = pca.fit_transform(X)
print('After PCA: ', X2.shape)
print('Info kept:', round(pca.explained_variance_ratio_.sum(), 3))

Note: Output: Before PCA: (150, 4) After PCA: (150, 2) Info kept: 0.978 We went from 4 columns to 2, yet kept about 98% of the information. Now the flowers can be drawn on a simple 2D scatter plot — much easier to explore.

“Info kept: 0.978” means the 2 new columns still carry about 98% of the variety (the differences between flowers) that all 4 original columns had. We threw away half the columns but only about 2% of the useful information — a great trade. You can even check the split per column:

How the kept information is split across the 2 new columns

# How much information each of the 2 new columns carries
for i, ratio in enumerate(pca.explained_variance_ratio_, start=1):
    print('Component', i, '->', round(ratio, 3))

Note: Output: Component 1 -> 0.925 Component 2 -> 0.053 The first new column alone carries about 92% of the information; the second adds another 5%, totalling the ~98% we saw. PCA always packs the most information into the first component, the next-most into the second, and so on.

Why bother reducing?

Plot it — humans can only see 2D or 3D, so reducing lets you visualise high-dimensional data.
Speed — fewer features means faster training.
Less noise — dropping weak, overlapping columns can sharpen a model.

Watch out: The new PCA columns are combinations of the originals — they no longer mean “petal length” or “width”. PCA is great for plotting and speed, but you lose the plain-English meaning of each feature.

Tip: Always scale your features before PCA (next unit), because PCA, like KNN, is sensitive to the size of each column.

Q. What does PCA do?

Answer: PCA is dimensionality reduction: it compresses many features into a few combined ones, preserving as much information (variance) as possible.

✍️ Practice

Change n_components to 1 and print how much information is kept.
Explain the “shadow” analogy for PCA in your own words.

🏠 Homework

Write 3–4 sentences on when reducing features helps and what you give up by doing it.