Going Deeper›Pro· 40 min read

Saving & Deploying a Model

A trained model trapped in a notebook helps no one — save it to a file and serve it behind a tiny web app.

What you will learn

Save and reload a trained model with joblib
Serve predictions from a small Flask API
Understand the train-once, predict-many workflow

Training is once; predicting is forever

Every script so far trained a model and threw it away when the program ended. That is fine for learning, but useless in the real world. In production you train once, then save the model and load it again to make millions of predictions — without ever retraining. A model that only lives in your notebook cannot help a real user.

Going from “a model in memory” to “a model real users can call” is called deployment. It has two parts: saving the trained model to a file, and serving it behind something a user or another program can talk to (usually a small web app).

Step 1 — save the trained model to a file

A trained scikit-learn model is just a Python object, so we can save it to disk and load it back later, fully trained. The standard tool is joblib (it handles the large number arrays inside models better than plain pickle).

Step 1: train once and save the model to a file

import joblib
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True)
model = RandomForestClassifier(n_estimators=100, random_state=0)
model.fit(X, y)

joblib.dump(model, 'iris_model.joblib')   # save the trained model
print('Saved iris_model.joblib')

Note: Output: Saved iris_model.joblib The fully trained forest is now a file on disk. Training is done forever — from here on we only ever load it.

Anywhere else — another script, a server, next week — you reload it with one line and predict immediately, no retraining:

Load the saved model and predict — no retraining needed

import joblib

model = joblib.load('iris_model.joblib')   # load the trained model back
print('Prediction:', model.predict([[5.1, 3.5, 1.4, 0.2]])[0])

Note: Output: Prediction: 0 The reloaded model predicted species 0 (setosa) for the measurements, exactly as the original would have. It came off disk already trained.

Step 2 — serve it behind a tiny web app

To let other programs (a website, a phone app) get predictions, you wrap the model in a small web server. The lightest way is Flask: it turns a Python function into a web address (an API endpoint) that takes input as JSON and returns the prediction as JSON. JSON is just the standard text format programs use to swap data.

Here is a complete, minimal prediction server. It loads the saved model once at startup, then answers every request.

Step 2: a tiny Flask API that serves predictions

from flask import Flask, request, jsonify
import joblib

app = Flask(__name__)
model = joblib.load('iris_model.joblib')   # load once at startup

@app.route('/predict', methods=['POST'])
def predict():
    data = request.get_json()              # e.g. {"features": [5.1, 3.5, 1.4, 0.2]}
    features = [data['features']]
    species = int(model.predict(features)[0])
    return jsonify({'species': species})

if __name__ == '__main__':
    app.run(port=5000)

Note: Output: (Starts a web server at http://localhost:5000. It does not print a prediction by itself — it waits for requests. Send one and it replies with JSON, see below.)

With that server running, any program can ask it for a prediction by sending JSON to the /predict address. From a terminal you might test it like this:

Calling the running API from the command line

curl -X POST http://localhost:5000/predict \
     -H "Content-Type: application/json" \
     -d '{"features": [5.1, 3.5, 1.4, 0.2]}'

Note: Output: {"species":0} The server loaded the model, ran .predict() on the features we sent, and replied with the species as JSON. That JSON reply is what a website or app would receive and display to the user.

The full picture

Train the model on your data (once).
Save it with joblib.dump to a .joblib file.
In a Flask app, joblib.load the file once at startup.
For each request, read the input JSON, call .predict(), return the result as JSON.
Deploy that app to a host (Render, Railway, a cloud VM) so it is reachable on the internet.

For a quick demo with a built-in user interface instead of a raw API, Streamlit is even simpler — a few lines turn your model into a web page with input boxes and a result, no front-end code. It is the fastest way to show a model off in a portfolio.

Watch out: Whatever preprocessing you did before training — scaling, encoding, imputing — must be applied identically at predict time, or the model gets data in the wrong shape and predicts nonsense. The clean fix is to save a whole pipeline (preprocessing + model) with joblib, so loading it restores every step at once.

Tip: Saving the pipeline, not just the bare model, is the professional habit: one file carries the scaler, the encoder and the model together, so deployment can never forget a preprocessing step.

Q. Why do we save a trained model to a file with joblib?

Answer: Training is expensive and done once. Saving the trained model (ideally the whole pipeline) lets you load it instantly elsewhere — in a server or another script — and serve fast predictions without ever retraining.

✍️ Practice

Save a model trained inside a Pipeline (scaler + model) and reload it; confirm one prediction works straight from the file.
Explain in one sentence why the Flask app should call joblib.load once at startup rather than inside the predict function.

🏠 Homework

Take the model from your end-to-end project, save it with joblib, and write the four steps you would follow to serve it behind a Flask or Streamlit app for a user.