Decision Trees
A decision tree asks a series of yes/no questions to reach an answer — easy to read.
What you will learn
- Explain how a decision tree decides
- Train DecisionTreeClassifier
- Appreciate why trees are easy to read
A flowchart the computer builds
A decision tree is a flowchart of yes/no questions. Starting at the top, you answer each question and follow the branch until you reach an answer at the bottom.
The clever part: the computer builds the flowchart itself from the data, choosing the questions that best separate the classes.
What a learned tree looks like
For our pass/fail study data, a tree might learn something like this:
Is hours_studied > 4 ?
├── No -> predict FAIL
└── Yes -> predict PASSNote: Output: (No output — this is the rule the tree learned, drawn as a flowchart. Real trees can have many such questions stacked up.)
Training one in scikit-learn
The code follows the same familiar pattern. We also peek at how the tree splits.
from sklearn.tree import DecisionTreeClassifier
X = [[1],[2],[3],[4],[5],[6],[7],[8]] # hours studied
y = [ 0, 0, 0, 0, 1, 1, 1, 1] # 0 = fail, 1 = pass
model = DecisionTreeClassifier(max_depth=1) # one question only
model.fit(X, y)
print('Studied 3h ->', model.predict([[3]])[0])
print('Studied 6h ->', model.predict([[6]])[0])Note: Output:
Studied 3h -> 0
Studied 6h -> 1
With max_depth=1 the tree asks a single question (about the 4-hour mark) and predicts fail or pass. Exactly the rule you would draw by hand.
How the tree picks its question
You might wonder: how does the computer know to ask “hours > 4?” It tries many possible split points and keeps the one that best separates the classes. A good split leaves each side as “pure” as possible — mostly one class.
| Possible split | Left side | Right side | How clean? |
|---|---|---|---|
| hours > 2 | Fail, Fail | Fail, Fail, Pass, Pass, Pass, Pass | Right side is mixed — messy |
| hours > 4 | Fail, Fail, Fail, Fail | Pass, Pass, Pass, Pass | Perfect — each side is one class |
| hours > 6 | Fail, Fail, Fail, Fail, Fail, Pass | Pass, Pass | Left side is mixed — messy |
The split at 4 hours is the winner because both sides end up completely pure (all Fail on one side, all Pass on the other). The tree picks that question automatically — you never told it the number 4.
Why people like trees
- They are easy to read — you can follow the questions and explain any decision.
- They handle numbers and categories without much prep.
- They need no feature scaling, unlike KNN.
Watch out: A deep tree can grow a question for every training row and memorise the data — a problem called overfitting (covered soon). Limiting max_depth keeps it sensible.
Tip: Decision trees are the building block of the powerful random forest in the next lesson — which is just many trees working together.
Q. How does a decision tree make a prediction?
✍️ Practice
- Set
max_depth=2, re-fit, and predict for 4 and 5 hours. Note any change. - Draw, as a flowchart, a decision tree that decides “take an umbrella?” from “is it raining?” and “are there clouds?”.
🏠 Homework
- Write a 3-question decision tree (in flowchart text) that decides whether to wear a jacket, based on temperature and wind.