Going DeeperPro· 35 min read

Ensembles: Bagging, Boosting, Stacking & Voting

Combining many models almost always beats one — here is the map of how to combine them.

What you will learn

  • Tell bagging, boosting and stacking apart
  • Build a VotingClassifier from several models
  • Pick the right ensemble for a problem

Why combine models at all?

An ensemble is a team of models that work together instead of one model working alone. You already met two of them — random forest and gradient boosting — but never saw the full family on one map. The idea behind every ensemble is the same: many so-so models, combined wisely, usually beat any single model, because their mistakes tend to cancel out.

There are four main ways to combine models. Learn the map once and you can always pick the right tool.

MethodHow models combineYou already saw
BaggingTrain many models on random slices in parallel, then average / voteRandom Forest
BoostingTrain models in a chain, each fixing the last one’s mistakesGradient Boosting
VotingTrain a few different model types, let them voteThis lesson
StackingFeed several models’ predictions into a final “judge” modelThis lesson

Bagging vs boosting in one picture

  • Bagging = a panel of judges who vote at the same time, each having seen a different sample of the evidence. Their average is steadier than any one judge. It mainly cuts variance (the model being shaky).
  • Boosting = a relay team where each runner starts where the last one struggled. It mainly cuts bias (the model being too simple), but pushed too hard it can overfit.

Voting: let different model types vote

A voting classifier trains several different models — say logistic regression, a decision tree and KNN — and combines their answers. Because each model is wrong in its own way, the majority vote is often right even when one model slips. scikit-learn’s VotingClassifier wires this up for you.

There are two voting styles: hard voting counts each model’s final label as one vote; soft voting averages their predicted probabilities (usually a little better when models can give probabilities).

Three different models vote on each flower
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True)
Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.3, random_state=0)

vote = VotingClassifier(estimators=[
    ('lr', LogisticRegression(max_iter=300)),
    ('tree', DecisionTreeClassifier(random_state=0)),
    ('knn', KNeighborsClassifier(n_neighbors=3)),
], voting='hard')
vote.fit(Xtr, ytr)

print('Voting accuracy:', round(vote.score(Xte, yte), 3))

Note: Output: Voting accuracy: 0.978 Three very different models — a line-based one, a tree and a neighbour-counter — each cast a vote, and the majority decided each flower. When one model is unsure, the other two usually carry the right answer.

Stacking: a model that learns whom to trust

Stacking goes one step further than voting. Instead of a fixed majority vote, it trains a final model (called the meta-model, often a simple logistic regression) whose inputs are the other models’ predictions. The meta-model learns which base model to trust when — for example, “trust the tree on these kinds of rows, the KNN on those”.

  1. Train several base models (tree, KNN, logistic regression).
  2. Collect their predictions on the data.
  3. Feed those predictions as features into a small meta-model.
  4. The meta-model learns the best way to blend them into the final answer.

Stacking can squeeze out a little more accuracy than voting, at the cost of more complexity and training time. It is a favourite finishing move in competitions.

Watch out: More models is not always better. Ensembles are slower to train and predict, and harder to explain than a single model. If you must justify every decision (a loan refusal, a medical flag), a clear single tree may beat a black-box ensemble.

Tip: A practical ladder: start with a single model to get a baseline, reach for a random forest (bagging) as a strong, low-effort default, try gradient boosting when you need the last few percent on tabular data, and use voting/stacking to combine your best few models at the end.

Q. What is the key difference between bagging and boosting?

Answer: Bagging (like Random Forest) trains many models independently in parallel and averages them. Boosting (like Gradient Boosting) trains models sequentially, each one focusing on the errors the previous ones made.

✍️ Practice

  1. Change the VotingClassifier to voting='soft' (all three estimators can give probabilities) and compare the accuracy.
  2. Explain in one sentence why a voting classifier of three different model types can beat any one of them alone.

🏠 Homework

  1. In your own words, describe bagging, boosting, voting and stacking in one sentence each, and name one situation where you would prefer a single simple model over any ensemble.
Want to learn this with a mentor?

CodingClave runs guided, project-based training (28-day, 45-day & 6-month batches).

Explore Training →