Evaluate & Improve›Extra· 35 min read

Cross-Validation

Test on several different splits and average the scores for a more trustworthy result.

What you will learn

Explain why one split can mislead
Run k-fold cross-validation
Read the average and spread of scores

One split can be lucky or unlucky

A single train/test split depends on which rows landed in the test set. A lucky split flatters your model; an unlucky one punishes it. The score wobbles for no real reason.

Cross-validation fixes this by testing on several different splits and averaging the results — a much steadier estimate.

How k-fold works

In k-fold cross-validation you cut the data into k equal folds. Then you train and test k times, each time using a different fold as the test set and the rest for training. Finally you average the k scores. Step by step, with k = 5:

Cut the data into 5 equal folds (5 roughly equal piles of rows).
Round 1: hold out fold 1 as the test set, train on folds 2–5, and record the score.
Round 2: hold out fold 2 instead, train on the other four folds, record that score.
Keep going until every fold has had exactly one turn as the test set — that is 5 rounds, so 5 scores.
Average the 5 scores. That average is your cross-validation result.

5-fold cross-validation: every fold gets a turn as the test set

The data, cut into 5 folds, tested 5 times:

Round 1:  [TEST ][train][train][train][train]
Round 2:  [train][TEST ][train][train][train]
Round 3:  [train][train][TEST ][train][train]
Round 4:  [train][train][train][TEST ][train]
Round 5:  [train][train][train][train][TEST ]

Final score = average of the 5 test scores

Note: Output: (No output — this diagram shows the idea. Each row is one round; the fold marked TEST is held out while the others train.)

Doing it in scikit-learn

scikit-learn does all five rounds for you in one line with cross_val_score. Below we load a built-in practice dataset (load_iris, 150 flowers), pick a model, and ask for cv=5 folds — it returns one score per fold, which we then average. (max_iter=200 just gives the model a few more steps to settle; you can ignore it for now.)

5-fold cross-validation in one line

from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True)
model = LogisticRegression(max_iter=200)

scores = cross_val_score(model, X, y, cv=5)   # 5 folds
print('Each fold:', [round(s, 2) for s in scores])
print('Average:  ', round(scores.mean(), 2))

Note: Output: Each fold: [0.97, 1.0, 0.93, 0.97, 1.0] Average: 0.97 Instead of one score that depends on luck, we get five and average them to about 0.97. The small spread (0.93 to 1.0) also shows the model is stable across splits.

Watch out: Cross-validation trains the model k times, so it is slower. For huge datasets a single well-sized split is sometimes enough; for small datasets, cross-validation is well worth it.

Tip: A spread of fold scores that is tight means a reliable model. A wide spread warns that performance depends heavily on which data it sees.

Q. Why use cross-validation instead of a single train/test split?

Answer: Cross-validation tests on several folds and averages the results, reducing the luck of any single split and giving a more trustworthy estimate.

✍️ Practice

Change cv=5 to cv=10 and compare the average score.
Explain why a single split might give an unusually high or low score.

🏠 Homework

In your own words, describe 5-fold cross-validation to a classmate, including why we average the scores.