Supervised LearningCore· 35 min read

Decision Trees

A decision tree asks a series of yes/no questions to reach an answer — easy to read.

What you will learn

  • Explain how a decision tree decides
  • Train DecisionTreeClassifier
  • Appreciate why trees are easy to read

A flowchart the computer builds

A decision tree is a flowchart of yes/no questions. Starting at the top, you answer each question and follow the branch until you reach an answer at the bottom.

The clever part: the computer builds the flowchart itself from the data, choosing the questions that best separate the classes.

What a learned tree looks like

For our pass/fail study data, a tree might learn something like this:

A tiny decision tree the model could learn
Is hours_studied > 4 ?
├── No  -> predict FAIL
└── Yes -> predict PASS

Note: Output: (No output — this is the rule the tree learned, drawn as a flowchart. Real trees can have many such questions stacked up.)

Training one in scikit-learn

The code follows the same familiar pattern. We also peek at how the tree splits.

A shallow decision tree on the study data
from sklearn.tree import DecisionTreeClassifier

X = [[1],[2],[3],[4],[5],[6],[7],[8]]   # hours studied
y = [ 0,  0,  0,  0,  1,  1,  1,  1]     # 0 = fail, 1 = pass

model = DecisionTreeClassifier(max_depth=1)   # one question only
model.fit(X, y)

print('Studied 3h ->', model.predict([[3]])[0])
print('Studied 6h ->', model.predict([[6]])[0])

Note: Output: Studied 3h -> 0 Studied 6h -> 1 With max_depth=1 the tree asks a single question (about the 4-hour mark) and predicts fail or pass. Exactly the rule you would draw by hand.

How the tree picks its question

You might wonder: how does the computer know to ask “hours > 4?” It tries many possible split points and keeps the one that best separates the classes. A good split leaves each side as “pure” as possible — mostly one class.

Possible splitLeft sideRight sideHow clean?
hours > 2Fail, FailFail, Fail, Pass, Pass, Pass, PassRight side is mixed — messy
hours > 4Fail, Fail, Fail, FailPass, Pass, Pass, PassPerfect — each side is one class
hours > 6Fail, Fail, Fail, Fail, Fail, PassPass, PassLeft side is mixed — messy

The split at 4 hours is the winner because both sides end up completely pure (all Fail on one side, all Pass on the other). The tree picks that question automatically — you never told it the number 4.

Why people like trees

  • They are easy to read — you can follow the questions and explain any decision.
  • They handle numbers and categories without much prep.
  • They need no feature scaling, unlike KNN.

Watch out: A deep tree can grow a question for every training row and memorise the data — a problem called overfitting (covered soon). Limiting max_depth keeps it sensible.

Tip: Decision trees are the building block of the powerful random forest in the next lesson — which is just many trees working together.

Q. How does a decision tree make a prediction?

Answer: A decision tree walks down a flowchart of yes/no questions until it reaches a leaf, which gives the prediction.

✍️ Practice

  1. Set max_depth=2, re-fit, and predict for 4 and 5 hours. Note any change.
  2. Draw, as a flowchart, a decision tree that decides “take an umbrella?” from “is it raining?” and “are there clouds?”.

🏠 Homework

  1. Write a 3-question decision tree (in flowchart text) that decides whether to wear a jacket, based on temperature and wind.
Want to learn this with a mentor?

CodingClave runs guided, project-based training (28-day, 45-day & 6-month batches).

Explore Training →