Supervised Learning›Extra· 40 min read

Support Vector Machines (SVM)

Draw the widest possible street between two classes — and bend it with a kernel when a straight line will not do.

What you will learn

Explain the “widest street” idea behind SVM
Train an SVC with scikit-learn
Switch kernels to separate curved data

Separate classes with the widest gap

A support vector machine (SVM) is a classifier. Like logistic regression it draws a boundary between two classes — but it does something special: it finds the boundary that leaves the widest possible gap on either side.

Picture two groups of dots, reds on the left and blues on the right. Many lines could separate them. SVM picks the line that sits as far as possible from the nearest dot of each group. That gap is called the margin, and a wider margin usually means the boundary works better on new data.

The few dots that sit right on the edge of the margin — the ones the boundary has to balance between — are the support vectors. They are the only points that matter; that is where the name comes from.

A worked example: pass vs fail

We will reuse the familiar study data. Each student has one feature (hours studied) and a label (0 = fail, 1 = pass). SVC is scikit-learn’s support vector classifier.

A linear SVM separating fail from pass

from sklearn.svm import SVC

X = [[1],[2],[3],[4],[5],[6],[7],[8]]   # hours studied
y = [ 0,  0,  0,  0,  1,  1,  1,  1]     # 0 = fail, 1 = pass

model = SVC(kernel='linear')   # a straight-line boundary
model.fit(X, y)

print('Studied 3h ->', model.predict([[3]])[0])
print('Studied 6h ->', model.predict([[6]])[0])

Note: Output: Studied 3h -> 0 Studied 6h -> 1 The SVM placed its boundary in the wide empty space around the 4-hour mark — the gap with the most room on both sides — then predicts fail below it and pass above it.

When a straight line is not enough: the kernel trick

Sometimes no straight line can separate the classes — imagine one class forming a ring around the other. The clever fix is the kernel trick: the kernel quietly lifts the data into a higher dimension where a straight line can split it, then projects the boundary back down as a curve. You never compute that higher dimension yourself; the kernel does it behind the scenes.

You pick a kernel with the kernel setting. Here is what each one does, in plain words:

kernel	Boundary shape	Use when
`'linear'`	A straight line / flat plane	Classes split cleanly with a straight cut
`'poly'`	A gentle curve	The split bends a little
`'rbf'` (default)	Flexible, wraps around blobs	Classes are curved or surrounded — a great default

Worked example: curved data needs RBF

The data below cannot be split by a straight line — the 1 class sits in the middle, flanked by the 0 class on both sides. We compare a linear kernel (which must fail) with an rbf kernel (which can curve around it).

A linear kernel cannot wrap the middle class; rbf can

from sklearn.svm import SVC

# class 1 is trapped in the middle, class 0 is on both ends
X = [[1],[2],[5],[6],[9],[10]]
y = [ 0,  0,  1,  1,  0,  0 ]

for k in ['linear', 'rbf']:
    m = SVC(kernel=k).fit(X, y)
    print(k, 'predicts 5.5 ->', m.predict([[5.5]])[0],
          '| training score', m.score(X, y))

Note: Output: linear predicts 5.5 -> 0 | training score 0.67 rbf predicts 5.5 -> 1 | training score 1.0 The straight-line kernel cannot trap the middle class, so it scores only 0.67. The rbf kernel curves around it, correctly calls 5.5 a 1, and fits the data perfectly. That is the kernel trick earning its keep.

Watch out: SVM compares distances, so — like KNN — it needs feature scaling (covered in the Feature Scaling lesson). Train an SVM on un-scaled data and a big-numbered column will dominate the margin.

Tip: SVMs shine on small-to-medium datasets with clear gaps between classes, especially when there are many features. They get slow on very large datasets, where tree ensembles are usually faster.

Q. What does a support vector machine try to do?

Answer: An SVM picks the boundary that maximises the margin — the gap to the nearest point of each class — and uses a kernel to handle curved separations.

✍️ Practice

Change the kernel to 'poly' on the curved example and compare its training score with rbf.
Explain in one sentence why an SVM, like KNN, needs its features scaled first.

🏠 Homework

In your own words, explain the “widest street” idea and what a support vector is, then describe one situation where you would prefer the rbf kernel over linear.