Naive Bayes
A fast, probability-based classifier that is the classic go-to for spam and text.
What you will learn
- Explain Bayes’ idea in plain words
- Understand why it is called “naive”
- Train a Naive Bayes classifier for text
Classifying with probability
Naive Bayes is a classifier built on probability. Instead of drawing a boundary, it asks: given these clues, what is the chance this is spam? what is the chance it is not? — and picks whichever is more likely.
It is based on Bayes’ theorem, a 250-year-old rule for updating a belief when new evidence arrives. You start with a prior hunch (most email is not spam), then each clue (the word “free”, the word “winner”) nudges that probability up or down.
Why is it called “naive”?
It is naive because it pretends every clue is independent — that seeing the word “free” tells you nothing about whether you will also see “money”. In reality those words travel together, so the assumption is plainly wrong. The surprise is that the classifier still works remarkably well, because for picking the more likely class it does not need the probabilities to be perfect, only to lean the right way.
A tiny worked example by hand
Suppose out of 100 past emails, 40 were spam and 60 were not. The word “free” appeared in 30 of the 40 spam emails but only 6 of the 60 normal ones. A new email contains “free”. Which class is more likely?
- Chance of “free” if spam = 30 ÷ 40 = 0.75.
- Chance of “free” if not spam = 6 ÷ 60 = 0.10.
- Weight each by how common the class is: spam score = 0.75 × (40÷100) = 0.30; not-spam score = 0.10 × (60÷100) = 0.06.
- Spam’s score (0.30) is much higher than not-spam’s (0.06), so the email is classified as spam.
That is the whole engine: multiply the chance of each clue under each class, weight by how common the class is, and pick the winner. With many words you just multiply more terms together.
Doing it in scikit-learn
Text first has to become numbers. CountVectorizer turns each message into word counts; then MultinomialNB (the Naive Bayes flavour made for word counts) learns from them.
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
texts = ['win free money now', 'free entry win prize',
'lunch at noon today', 'see you at the meeting']
labels = ['spam', 'spam', 'ham', 'ham'] # ham = not spam
vec = CountVectorizer()
X = vec.fit_transform(texts) # texts -> word-count numbers
model = MultinomialNB()
model.fit(X, labels)
new = vec.transform(['free prize today'])
print('Prediction ->', model.predict(new)[0])Note: Output: Prediction -> spam The words “free” and “prize” appeared in spam during training, so Naive Bayes judges the new message far more likely to be spam than ham. “today” pulled the other way, but the spammy words won.
Tip: Naive Bayes is fast, needs little data, and is the classic baseline for text problems (spam, sentiment, topic tagging). Always try it first on a text task before reaching for anything heavier.
Watch out: Because it multiplies probabilities, a word never seen with a class would force the whole product to zero. scikit-learn quietly avoids this with a smoothing trick, so you rarely have to worry — but it is why the “naive” independence shortcut is needed to keep the maths simple.
Q. Why is Naive Bayes called “naive”?
✍️ Practice
- Add the message
'free meeting today'to the prediction step and see which class wins. - By hand, redo the “free” example but assume 20 of 40 spam emails contained “free” — does the answer change?
🏠 Homework
- Explain Bayes’ idea (prior belief plus evidence) in 3–4 sentences using a real example of your own, such as predicting rain from clouds.