Supervised LearningExtra· 35 min read

Random Forest

Many decision trees vote together — usually far more accurate than one tree alone.

What you will learn

  • Explain the “wisdom of the crowd” idea
  • Train RandomForestClassifier
  • Know why a forest beats a single tree

One tree can be wrong; a crowd is steadier

A single decision tree can be shaky — change a little data and it draws a very different flowchart. A random forest fixes this by training many trees, each on a slightly different slice of the data, and letting them vote.

This is the wisdom of the crowd. One person might guess wrong, but the average of many independent guesses is usually close to right.

A worked example

We will classify fruit again, but with a forest of 100 trees instead of one rule.

A random forest of 100 trees voting on each fruit
from sklearn.ensemble import RandomForestClassifier

X = [[7, 150], [7, 170], [6, 140],     # apples
     [8, 110], [9, 120], [8, 100]]     # oranges
y = ['apple','apple','apple',
     'orange','orange','orange']

model = RandomForestClassifier(n_estimators=100, random_state=0)
model.fit(X, y)

print('Fruit [7,160] ->', model.predict([[7, 160]])[0])
print('Fruit [9,105] ->', model.predict([[9, 105]])[0])

Note: Output: Fruit [7,160] -> apple Fruit [9,105] -> orange Each of the 100 trees votes; the majority wins. The heavy, narrow fruit is called an apple and the light, wide one an orange — and the crowd vote is more reliable than any single tree.

Which feature mattered most?

A forest can also tell you which features were most useful — great for understanding your data.

How important each feature was to the forest
for name, score in zip(['width','weight'], model.feature_importances_):
    print(name, '->', round(score, 2))

Note: Output: width -> 0.42 weight -> 0.58 Weight mattered a little more than width for telling these fruits apart. Real numbers vary, but the idea is the forest ranks your features for you.

Single treeRandom forest
AccuracyOkayUsually higher
StabilityShakySteady
Easy to readVeryLess (it is many trees)
SpeedFastSlower (more trees)

Tip: Random forest is a fantastic default for many problems: accurate, hard to break, and it needs little tuning. n_estimators is just how many trees to grow.

Q. Why is a random forest usually better than a single decision tree?

Answer: A forest combines many trees trained on different data slices. Their majority vote is more accurate and stable than any single tree.

✍️ Practice

  1. Change n_estimators to 10 and re-run. Do the predictions stay the same?
  2. Print feature_importances_ and write one sentence on which feature helped most.

🏠 Homework

  1. In your own words, explain “wisdom of the crowd” and how a random forest uses it to beat a single tree.
Want to learn this with a mentor?

CodingClave runs guided, project-based training (28-day, 45-day & 6-month batches).

Explore Training →