The ML Workflow
Every ML project follows the same five steps: data, prep, train, predict, evaluate.
What you will learn
- List the five steps of an ML project
- See why the order matters
- Recognise the steps inside real code
The same five steps every time
No matter the problem, an ML project follows the same recipe. Learn it once and you can read almost any ML script.
- Get data — collect your examples (often a table or a CSV file — short for comma-separated values, a plain-text spreadsheet where each row is a line and the values are separated by commas).
- Prepare — clean it and split it into inputs and answers.
- Train — let a model learn the pattern from the data.
- Predict — ask the trained model about new, unseen data.
- Evaluate — measure how often it is right, then improve.
A kitchen analogy
It is like cooking: get ingredients, prepare them, cook (train), serve (predict), then taste and adjust (evaluate). Skip prep and the whole dish suffers — same with ML.
The five steps in real code
You will meet this exact shape again and again. Read the comments — each one is a step from the list above. The first two lines just import (borrow) the tools we need from sklearn (the code name for the scikit-learn library): a ready-made model and the helper that splits the data.
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
# 1) GET data: features (X) and answers (y)
X = [[1],[2],[3],[4],[5],[6],[7],[8]] # hours studied
y = [ 0, 0, 0, 0, 1, 1, 1, 1] # 0 = fail, 1 = pass
# 2) PREPARE: hold back 25% to test on later
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.25, random_state=0)
# 3) TRAIN
model = LogisticRegression()
model.fit(X_train, y_train)
# 4) PREDICT for a new student who studied 7 hours
print('Studied 7h ->', model.predict([[7]])[0])
# 5) EVALUATE on the held-back data
print('Accuracy:', model.score(X_test, y_test))Note: Output: Studied 7h -> 1 Accuracy: 1.0 The model predicts 1 (pass) for 7 hours and got 100% of the held-back examples right. Do not worry about the details yet — just notice the five steps. We unpack each one over the next lessons.
Watch out: The steps are in this order for a reason. You must prepare and split the data before training, or your final score will be a lie (more on that two lessons from now).
Tip: Whenever you read ML code, find the five steps. Spotting train_test_split, .fit(), .predict() and .score() tells you exactly what the code is doing.
Q. What is the correct order of the ML workflow?
✍️ Practice
- Write the five steps in your own words without looking.
- In the code above, point to the exact line that does each of the five steps.
🏠 Homework
- Pick any prediction idea (e.g. predict if it will rain) and write one sentence describing each of the five steps for it.