Boosting & Gradient Boosting
Build trees one after another, each fixing the mistakes of the last — the technique that wins most tabular ML.
What you will learn
- Tell bagging apart from boosting
- Explain how boosting learns from its own errors
- Train a gradient boosting model and compare it to a forest
A different way to combine trees
A random forest grows many trees in parallel, each on a random slice, then lets them vote. That is called bagging. Boosting takes the opposite approach: it grows trees one at a time, in a chain, and each new tree focuses on the rows the previous trees got wrong.
Picture a team marking exam papers. The first marker does a rough job. The second marker does not re-mark everything — they concentrate on the papers the first one messed up. The third fixes what is still wrong, and so on. Each step boosts the result by patching the remaining mistakes. Add up everyone’s corrections and the final marking is excellent.
| Bagging (Random Forest) | Boosting (Gradient Boosting) | |
|---|---|---|
| Trees built | In parallel, independently | One after another, in a chain |
| Each tree focuses on | A random slice of data | The previous trees’ mistakes |
| Main strength | Stable, hard to break | Often the highest accuracy |
| Main risk | Slightly less accurate | Can overfit if pushed too hard |
How gradient boosting learns from errors
The most popular form is gradient boosting. “Gradient” just means it measures the leftover error after each tree and points the next tree at it. Step by step:
- Make a first simple prediction (often just the average).
- Measure the error — how far off each row is (the leftover, called the residual).
- Train a small tree to predict that error, and add its correction to the running prediction.
- Measure the new, smaller error and repeat — each tree shrinks what is left.
- Stop after a set number of trees; the sum of all the corrections is the final model.
A learning rate controls how big each correction is. Small, careful steps (a low learning rate) with more trees usually generalise better than a few big leaps.
A worked example
We classify our fruit again, comparing a random forest with GradientBoostingClassifier on the same data.
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier
X = [[7, 150], [7, 170], [6, 140], # apples
[8, 110], [9, 120], [8, 100]] # oranges
y = ['apple','apple','apple',
'orange','orange','orange']
forest = RandomForestClassifier(n_estimators=100, random_state=0).fit(X, y)
boost = GradientBoostingClassifier(n_estimators=100,
learning_rate=0.1, random_state=0).fit(X, y)
mystery = [[7, 160]]
print('Forest ->', forest.predict(mystery)[0])
print('Boost ->', boost.predict(mystery)[0])Note: Output: Forest -> apple Boost -> apple Both call the mystery fruit an apple. On this tiny dataset they agree, but on large, messy real-world tables boosting usually edges ahead — it keeps drilling into the hard, easily-confused rows the forest treats the same as any other.
XGBoost, LightGBM and friends
You will hear names like XGBoost, LightGBM and CatBoost. These are faster, more polished libraries that all do gradient boosting under the hood. They dominate Kaggle competitions and most real-world work on tabular data (rows and columns, like a spreadsheet). scikit-learn’s GradientBoostingClassifier teaches the exact same idea; the others just run quicker on big data.
Watch out: Boosting can overfit if you use too many trees or too high a learning rate — it will eventually start memorising the noise. Use a modest learning_rate (around 0.05–0.1), watch your test score, and tune the number of trees (the Hyperparameter Tuning lesson shows how).
Tip: Rule of thumb on tabular data: a random forest is the safe, no-fuss default; gradient boosting is what you reach for when you want to squeeze out the last bit of accuracy and are willing to tune it.
Q. How does boosting differ from bagging (random forest)?
✍️ Practice
- Change
learning_rateto 1.0 and to 0.01 on the fruit data and note that the prediction can wobble at extremes. - Write one sentence each explaining when you would pick a random forest and when you would pick gradient boosting.
🏠 Homework
- Explain the “team of exam markers” analogy for boosting in your own words, and say why each new tree focuses on the previous mistakes.