Evaluate & Improve›Core· 40 min read

Metrics: Accuracy, Precision, Recall

Accuracy alone can lie — precision, recall and a confusion matrix tell the full story.

What you will learn

Compute accuracy honestly
Read a confusion matrix
Tell precision from recall

Accuracy: the simple score

Accuracy is the share of predictions the model got right: correct ÷ total. It is the first number everyone checks.

Accuracy: how many predictions were correct

from sklearn.metrics import accuracy_score

y_true = [1, 0, 1, 1, 0, 1]     # the real answers
y_pred = [1, 0, 0, 1, 0, 1]     # what the model guessed

print('Accuracy:', accuracy_score(y_true, y_pred))

Note: Output: Accuracy: 0.8333333333333334 The model got 5 of 6 right, so about 83% accuracy. Simple — but as we will see, accuracy can be misleading.

Watch out: Accuracy can lie. If 99 of 100 emails are “not spam”, a lazy model that says “not spam” every time scores 99% accuracy while catching zero spam. You need more than accuracy.

The confusion matrix

A confusion matrix breaks results into four boxes: where the model was right and where it slipped.

	Predicted: Yes	Predicted: No
Actually Yes	True Positive (correct)	False Negative (missed)
Actually No	False Positive (false alarm)	True Negative (correct)

Precision and recall

From those boxes come two key scores:

Precision — of everything the model flagged as Yes, how much really was Yes? (Few false alarms.)
Recall — of all the real Yes cases, how many did the model catch? (Few misses.)

Precision and recall on the same predictions

from sklearn.metrics import precision_score, recall_score

print('Precision:', round(precision_score(y_true, y_pred), 2))
print('Recall:   ', round(recall_score(y_true, y_pred), 2))

Note: Output: Precision: 1.0 Recall: 0.75 Every “Yes” the model predicted was truly Yes (precision 1.0), but it missed one real Yes (recall 0.75). High precision, slightly lower recall.

A fully worked example: a disease test

Numbers make this click. Suppose a test was run on 100 patients. In reality 10 are sick and 90 are healthy. The test flags 8 patients as sick, but only 6 of those 8 are truly sick (so 2 are false alarms), and it misses 4 sick patients by calling them healthy. Here is the confusion matrix with real counts:

	Test says: Sick	Test says: Healthy
Actually Sick (10)	TP = 6	FN = 4 (missed!)
Actually Healthy (90)	FP = 2 (false alarm)	TN = 88

Now compute the two scores straight from the boxes:

Precision = TP ÷ (TP + FP) = 6 ÷ (6 + 2) = 6 ÷ 8 = 0.75. Of everyone it flagged, 75% were really sick.
Recall = TP ÷ (TP + FN) = 6 ÷ (6 + 4) = 6 ÷ 10 = 0.60. It caught only 60% of the truly sick.

Accuracy here is (6 + 88) ÷ 100 = 0.94 — which looks great, yet the test missed 4 sick people. That gap between a shiny accuracy and a worrying recall is exactly why you must look past accuracy.

Which matters more depends on the cost of each mistake. For a disease test, a miss (low recall) is dangerous — you would rather raise a few false alarms than send a sick patient home. For a spam filter, a false alarm (low precision) that bins a real email is annoying, so precision matters more there.

Tip: Remember it as: precision = trust the alarms; recall = catch them all. When both matter, people combine them into a single number called the F1 score.

Q. A model says “not spam” for every email and is 99% accurate because spam is rare. What is its recall on spam?

Answer: Recall measures how many real spam emails were caught. Catching none means 0% recall, even though accuracy looks great — which is why accuracy alone can mislead.

✍️ Practice

Change y_pred to add one false positive and recompute precision and recall.
For a cancer-screening test, say whether precision or recall matters more, and why.

🏠 Homework

Build a confusion matrix on paper for 10 predictions you make up, then compute accuracy, precision and recall.