Supervised Learning›Extra· 35 min read

Naive Bayes

A fast, probability-based classifier that is the classic go-to for spam and text.

What you will learn

Explain Bayes’ idea in plain words
Understand why it is called “naive”
Train a Naive Bayes classifier for text

Classifying with probability

Naive Bayes is a classifier built on probability. Instead of drawing a boundary, it asks: given these clues, what is the chance this is spam? what is the chance it is not? — and picks whichever is more likely.

It is based on Bayes’ theorem, a 250-year-old rule for updating a belief when new evidence arrives. You start with a prior hunch (most email is not spam), then each clue (the word “free”, the word “winner”) nudges that probability up or down.

Why is it called “naive”?

It is naive because it pretends every clue is independent — that seeing the word “free” tells you nothing about whether you will also see “money”. In reality those words travel together, so the assumption is plainly wrong. The surprise is that the classifier still works remarkably well, because for picking the more likely class it does not need the probabilities to be perfect, only to lean the right way.

A tiny worked example by hand

Suppose out of 100 past emails, 40 were spam and 60 were not. The word “free” appeared in 30 of the 40 spam emails but only 6 of the 60 normal ones. A new email contains “free”. Which class is more likely?

Chance of “free” if spam = 30 ÷ 40 = 0.75.
Chance of “free” if not spam = 6 ÷ 60 = 0.10.
Weight each by how common the class is: spam score = 0.75 × (40÷100) = 0.30; not-spam score = 0.10 × (60÷100) = 0.06.
Spam’s score (0.30) is much higher than not-spam’s (0.06), so the email is classified as spam.

That is the whole engine: multiply the chance of each clue under each class, weight by how common the class is, and pick the winner. With many words you just multiply more terms together.

Doing it in scikit-learn

Text first has to become numbers. CountVectorizer turns each message into word counts; then MultinomialNB (the Naive Bayes flavour made for word counts) learns from them.

A Naive Bayes spam classifier on tiny text data

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB

texts  = ['win free money now', 'free entry win prize',
          'lunch at noon today', 'see you at the meeting']
labels = ['spam', 'spam', 'ham', 'ham']   # ham = not spam

vec = CountVectorizer()
X = vec.fit_transform(texts)      # texts -> word-count numbers

model = MultinomialNB()
model.fit(X, labels)

new = vec.transform(['free prize today'])
print('Prediction ->', model.predict(new)[0])

Note: Output: Prediction -> spam The words “free” and “prize” appeared in spam during training, so Naive Bayes judges the new message far more likely to be spam than ham. “today” pulled the other way, but the spammy words won.

Tip: Naive Bayes is fast, needs little data, and is the classic baseline for text problems (spam, sentiment, topic tagging). Always try it first on a text task before reaching for anything heavier.

Watch out: Because it multiplies probabilities, a word never seen with a class would force the whole product to zero. scikit-learn quietly avoids this with a smoothing trick, so you rarely have to worry — but it is why the “naive” independence shortcut is needed to keep the maths simple.

Q. Why is Naive Bayes called “naive”?

Answer: Naive Bayes naively assumes the features are independent of one another. The assumption is usually false, yet the classifier still performs well, especially on text.

✍️ Practice

Add the message 'free meeting today' to the prediction step and see which class wins.
By hand, redo the “free” example but assume 20 of 40 spam emails contained “free” — does the answer change?

🏠 Homework

Explain Bayes’ idea (prior belief plus evidence) in 3–4 sentences using a real example of your own, such as predicting rain from clouds.