Supervised Learning›Extra· 35 min read

Random Forest

Many decision trees vote together — usually far more accurate than one tree alone.

What you will learn

Explain the “wisdom of the crowd” idea
Train RandomForestClassifier
Know why a forest beats a single tree

One tree can be wrong; a crowd is steadier

A single decision tree can be shaky — change a little data and it draws a very different flowchart. A random forest fixes this by training many trees, each on a slightly different slice of the data, and letting them vote.

This is the wisdom of the crowd. One person might guess wrong, but the average of many independent guesses is usually close to right.

A worked example

We will classify fruit again, but with a forest of 100 trees instead of one rule.

A random forest of 100 trees voting on each fruit

from sklearn.ensemble import RandomForestClassifier

X = [[7, 150], [7, 170], [6, 140],     # apples
     [8, 110], [9, 120], [8, 100]]     # oranges
y = ['apple','apple','apple',
     'orange','orange','orange']

model = RandomForestClassifier(n_estimators=100, random_state=0)
model.fit(X, y)

print('Fruit [7,160] ->', model.predict([[7, 160]])[0])
print('Fruit [9,105] ->', model.predict([[9, 105]])[0])

Note: Output: Fruit [7,160] -> apple Fruit [9,105] -> orange Each of the 100 trees votes; the majority wins. The heavy, narrow fruit is called an apple and the light, wide one an orange — and the crowd vote is more reliable than any single tree.

Which feature mattered most?

A forest can also tell you which features were most useful — great for understanding your data.

How important each feature was to the forest

for name, score in zip(['width','weight'], model.feature_importances_):
    print(name, '->', round(score, 2))

Note: Output: width -> 0.42 weight -> 0.58 Weight mattered a little more than width for telling these fruits apart. Real numbers vary, but the idea is the forest ranks your features for you.

	Single tree	Random forest
Accuracy	Okay	Usually higher
Stability	Shaky	Steady
Easy to read	Very	Less (it is many trees)
Speed	Fast	Slower (more trees)

Tip: Random forest is a fantastic default for many problems: accurate, hard to break, and it needs little tuning. n_estimators is just how many trees to grow.

Q. Why is a random forest usually better than a single decision tree?

Answer: A forest combines many trees trained on different data slices. Their majority vote is more accurate and stable than any single tree.

✍️ Practice

Change n_estimators to 10 and re-run. Do the predictions stay the same?
Print feature_importances_ and write one sentence on which feature helped most.

🏠 Homework

In your own words, explain “wisdom of the crowd” and how a random forest uses it to beat a single tree.