PandasExtra· 35 min read

Combining Tables with merge

Real answers often live in two tables — join them on a shared column to bring the data together.

What you will learn

  • Merge two DataFrames on a shared key
  • Understand a left vs inner join
  • Spot unmatched rows

When the data is split across tables

Data is often spread across separate tables — one for orders, one for customers. To answer “which city placed the biggest order?”, you must combine them. Pandas does this with merge, matching rows on a shared column called the key.

This is the same idea as a SQL JOIN, if you have seen one. Here are our two little tables:

Two related tables sharing a cust_id column
import pandas as pd

orders = pd.DataFrame({
    'order_id': [1, 2, 3],
    'cust_id':  [10, 11, 10],
    'amount':   [250, 120, 300]
})

customers = pd.DataFrame({
    'cust_id': [10, 11],
    'city':    ['Pune', 'Delhi']
})

Note: Output: (No output — just two tables. Orders has a custid but no city; customers maps each custid to a city. The shared column cust_id is our key.)

Join them on the key

We tell merge which column to match on with on='cust_id'. Pandas finds each order’s customer and brings in their city.

Merge orders with customers on cust_id
combined = orders.merge(customers, on='cust_id')
print(combined)

Note: Output: orderid custid amount city 0 1 10 250 Pune 1 2 11 120 Delhi 2 3 10 300 Pune Each order now carries the customer’s city. Orders 1 and 3 both belong to customer 10, so both show Pune. We could now answer city questions with groupby.

Inner vs left joins

What if an order points to a customer not in the customers table? The how option decides what happens to unmatched rows.

how=KeepsUse when
'inner' (default)Only rows that match in bothYou want clean, matched data
'left'All left rows; blanks if no matchYou must keep every order
'right'All right rowsYou must keep every customer
'outer'Everything from bothYou want nothing dropped

Watch out: A default (inner) merge silently drops rows that do not match on the key. If your combined table has fewer rows than expected, unmatched keys are usually why — try how='left' to keep them all.

Tip: The key column must mean the same thing in both tables (here, cust_id). If the columns have different names, use left_on and right_on to say which is which.

Q. What does the on argument in orders.merge(customers, on='cust_id') specify?

Answer: on names the key column present in both tables; Pandas matches rows where that column’s values are equal.

✍️ Practice

  1. Create a products table and a sales table sharing a product_id, then merge them.
  2. Change one cust_id in orders to a value not in customers, then compare an inner merge with a how='left' merge.

🏠 Homework

  1. Find or make two related CSVs (e.g. orders and customers), merge them on a shared key, and write one insight you could only get after combining them.
Want to learn this with a mentor?

CodingClave runs guided, project-based training (28-day, 45-day & 6-month batches).

Explore Training →