Pandas›Core· 35 min read

Cleaning Data: Missing Values

Real data has blanks. Before you analyse it, you find the missing values and decide whether to fill them or drop them.

What you will learn

Find missing values with isnull().sum()
Fill blanks with fillna()
Decide when to drop instead of fill

Cleaning is most of the job

Data scientists spend more time cleaning data than anything else. Garbage in, garbage out: if the data is wrong, every chart and conclusion built on it is wrong too. The most common problem of all is missing values — blank cells where a number or word should be.

In Pandas a missing value shows up as NaN (short for “Not a Number” — Pandas’ way of writing a blank cell). Here is a small table with one:

Handling missing values is a small three-step routine. We will walk through each step below.

Find the blanks — count how many each column has, with isnull().sum().
Decide what to do — fill the blanks with a sensible value, or drop the rows that have them.
Fix & check — apply your choice, then run isnull().sum() again to confirm no blanks remain.

A table with one missing value

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'name': ['Asha', 'Ravi', 'Meera', 'Karan'],
    'age':  [25, np.nan, 28, 32]      # Ravi's age is missing (NaN)
})
print(df)

Note: Output: name age 0 Asha 25.0 1 Ravi NaN 2 Meera 28.0 3 Karan 32.0 Ravi’s age is NaN — a blank. If we tried to use this column for maths or a chart without fixing it, NaN would cause trouble.

Step 1 — find the blanks

Count missing values per column with isnull().sum(). It is the first thing to run on any new dataset.

Count missing values per column

print(df.isnull().sum())     # how many blanks in each column?

Note: Output: name 0 age 1 dtype: int64 Name has no blanks; age has exactly 1. Now we know precisely what to fix before going further.

Step 2 — fill the gap

A common, gentle fix is to fill the blank with the column’s average, so we keep every row. Use fillna.

Fill the missing age with the column mean

df['age'] = df['age'].fillna(df['age'].mean())   # fill blank with the average
print(df['age'])

Note: Output: 0 25.000000 1 28.333333 2 28.000000 3 32.000000 Name: age, dtype: float64 Ravi’s blank became 28.33 — the average of the other three ages (25, 28, 32). No gaps remain, and we kept all four people.

Fill or drop?

Instead of filling, you can drop rows that have blanks with dropna(). Which is right depends on how much data is missing.

Approach	Code	Use when
Fill with average	`fillna(df['age'].mean())`	Numbers, few blanks
Fill with a value	`fillna(0)` or `fillna('Unknown')`	A sensible default exists
Drop the rows	`df.dropna()`	Blanks are rare and you can spare them

Watch out: Dropping rows with dropna() throws away data — fine if blanks are rare, risky if many rows have gaps. Think before you delete; filling is often safer.

Tip: Always run isnull().sum() first. You cannot decide how to handle missing values until you know which columns have them and how many.

Q. What does df['age'].fillna(df['age'].mean()) do?

Answer: fillna() fills blank (NaN) values; passing the column mean replaces each missing age with the average of the others.

✍️ Practice

Make a small DataFrame with one missing value and count blanks with isnull().sum().
Fill that blank with the column mean, then try dropna() on a fresh copy and compare the row counts.

🏠 Homework

Take a real CSV, run isnull().sum(), and handle any missing values — fill the numeric columns and decide whether to drop or fill the rest. Note what you chose and why.