Metrics: Accuracy, Precision, Recall
Accuracy alone can lie — precision, recall and a confusion matrix tell the full story.
What you will learn
- Compute accuracy honestly
- Read a confusion matrix
- Tell precision from recall
Accuracy: the simple score
Accuracy is the share of predictions the model got right: correct ÷ total. It is the first number everyone checks.
from sklearn.metrics import accuracy_score
y_true = [1, 0, 1, 1, 0, 1] # the real answers
y_pred = [1, 0, 0, 1, 0, 1] # what the model guessed
print('Accuracy:', accuracy_score(y_true, y_pred))Note: Output: Accuracy: 0.8333333333333334 The model got 5 of 6 right, so about 83% accuracy. Simple — but as we will see, accuracy can be misleading.
Watch out: Accuracy can lie. If 99 of 100 emails are “not spam”, a lazy model that says “not spam” every time scores 99% accuracy while catching zero spam. You need more than accuracy.
The confusion matrix
A confusion matrix breaks results into four boxes: where the model was right and where it slipped.
| Predicted: Yes | Predicted: No | |
|---|---|---|
| Actually Yes | True Positive (correct) | False Negative (missed) |
| Actually No | False Positive (false alarm) | True Negative (correct) |
Precision and recall
From those boxes come two key scores:
- Precision — of everything the model flagged as Yes, how much really was Yes? (Few false alarms.)
- Recall — of all the real Yes cases, how many did the model catch? (Few misses.)
from sklearn.metrics import precision_score, recall_score
print('Precision:', round(precision_score(y_true, y_pred), 2))
print('Recall: ', round(recall_score(y_true, y_pred), 2))Note: Output: Precision: 1.0 Recall: 0.75 Every “Yes” the model predicted was truly Yes (precision 1.0), but it missed one real Yes (recall 0.75). High precision, slightly lower recall.
A fully worked example: a disease test
Numbers make this click. Suppose a test was run on 100 patients. In reality 10 are sick and 90 are healthy. The test flags 8 patients as sick, but only 6 of those 8 are truly sick (so 2 are false alarms), and it misses 4 sick patients by calling them healthy. Here is the confusion matrix with real counts:
| Test says: Sick | Test says: Healthy | |
|---|---|---|
| Actually Sick (10) | TP = 6 | FN = 4 (missed!) |
| Actually Healthy (90) | FP = 2 (false alarm) | TN = 88 |
Now compute the two scores straight from the boxes:
- Precision = TP ÷ (TP + FP) = 6 ÷ (6 + 2) = 6 ÷ 8 = 0.75. Of everyone it flagged, 75% were really sick.
- Recall = TP ÷ (TP + FN) = 6 ÷ (6 + 4) = 6 ÷ 10 = 0.60. It caught only 60% of the truly sick.
Accuracy here is (6 + 88) ÷ 100 = 0.94 — which looks great, yet the test missed 4 sick people. That gap between a shiny accuracy and a worrying recall is exactly why you must look past accuracy.
Which matters more depends on the cost of each mistake. For a disease test, a miss (low recall) is dangerous — you would rather raise a few false alarms than send a sick patient home. For a spam filter, a false alarm (low precision) that bins a real email is annoying, so precision matters more there.
Tip: Remember it as: precision = trust the alarms; recall = catch them all. When both matter, people combine them into a single number called the F1 score.
Q. A model says “not spam” for every email and is 99% accurate because spam is rare. What is its recall on spam?
✍️ Practice
- Change
y_predto add one false positive and recompute precision and recall. - For a cancer-screening test, say whether precision or recall matters more, and why.
🏠 Homework
- Build a confusion matrix on paper for 10 predictions you make up, then compute accuracy, precision and recall.