Support Vector Machines (SVM)
Draw the widest possible street between two classes — and bend it with a kernel when a straight line will not do.
What you will learn
- Explain the “widest street” idea behind SVM
- Train an SVC with scikit-learn
- Switch kernels to separate curved data
Separate classes with the widest gap
A support vector machine (SVM) is a classifier. Like logistic regression it draws a boundary between two classes — but it does something special: it finds the boundary that leaves the widest possible gap on either side.
Picture two groups of dots, reds on the left and blues on the right. Many lines could separate them. SVM picks the line that sits as far as possible from the nearest dot of each group. That gap is called the margin, and a wider margin usually means the boundary works better on new data.
The few dots that sit right on the edge of the margin — the ones the boundary has to balance between — are the support vectors. They are the only points that matter; that is where the name comes from.
A worked example: pass vs fail
We will reuse the familiar study data. Each student has one feature (hours studied) and a label (0 = fail, 1 = pass). SVC is scikit-learn’s support vector classifier.
from sklearn.svm import SVC
X = [[1],[2],[3],[4],[5],[6],[7],[8]] # hours studied
y = [ 0, 0, 0, 0, 1, 1, 1, 1] # 0 = fail, 1 = pass
model = SVC(kernel='linear') # a straight-line boundary
model.fit(X, y)
print('Studied 3h ->', model.predict([[3]])[0])
print('Studied 6h ->', model.predict([[6]])[0])Note: Output: Studied 3h -> 0 Studied 6h -> 1 The SVM placed its boundary in the wide empty space around the 4-hour mark — the gap with the most room on both sides — then predicts fail below it and pass above it.
When a straight line is not enough: the kernel trick
Sometimes no straight line can separate the classes — imagine one class forming a ring around the other. The clever fix is the kernel trick: the kernel quietly lifts the data into a higher dimension where a straight line can split it, then projects the boundary back down as a curve. You never compute that higher dimension yourself; the kernel does it behind the scenes.
You pick a kernel with the kernel setting. Here is what each one does, in plain words:
| kernel | Boundary shape | Use when |
|---|---|---|
'linear' | A straight line / flat plane | Classes split cleanly with a straight cut |
'poly' | A gentle curve | The split bends a little |
'rbf' (default) | Flexible, wraps around blobs | Classes are curved or surrounded — a great default |
Worked example: curved data needs RBF
The data below cannot be split by a straight line — the 1 class sits in the middle, flanked by the 0 class on both sides. We compare a linear kernel (which must fail) with an rbf kernel (which can curve around it).
from sklearn.svm import SVC
# class 1 is trapped in the middle, class 0 is on both ends
X = [[1],[2],[5],[6],[9],[10]]
y = [ 0, 0, 1, 1, 0, 0 ]
for k in ['linear', 'rbf']:
m = SVC(kernel=k).fit(X, y)
print(k, 'predicts 5.5 ->', m.predict([[5.5]])[0],
'| training score', m.score(X, y))Note: Output: linear predicts 5.5 -> 0 | training score 0.67 rbf predicts 5.5 -> 1 | training score 1.0 The straight-line kernel cannot trap the middle class, so it scores only 0.67. The rbf kernel curves around it, correctly calls 5.5 a 1, and fits the data perfectly. That is the kernel trick earning its keep.
Watch out: SVM compares distances, so — like KNN — it needs feature scaling (covered in the Feature Scaling lesson). Train an SVM on un-scaled data and a big-numbered column will dominate the margin.
Tip: SVMs shine on small-to-medium datasets with clear gaps between classes, especially when there are many features. They get slow on very large datasets, where tree ensembles are usually faster.
Q. What does a support vector machine try to do?
✍️ Practice
- Change the
kernelto'poly'on the curved example and compare its training score withrbf. - Explain in one sentence why an SVM, like KNN, needs its features scaled first.
🏠 Homework
- In your own words, explain the “widest street” idea and what a support vector is, then describe one situation where you would prefer the
rbfkernel overlinear.