Combining Tables with merge
Real answers often live in two tables — join them on a shared column to bring the data together.
What you will learn
- Merge two DataFrames on a shared key
- Understand a left vs inner join
- Spot unmatched rows
When the data is split across tables
Data is often spread across separate tables — one for orders, one for customers. To answer “which city placed the biggest order?”, you must combine them. Pandas does this with merge, matching rows on a shared column called the key.
This is the same idea as a SQL JOIN, if you have seen one. Here are our two little tables:
import pandas as pd
orders = pd.DataFrame({
'order_id': [1, 2, 3],
'cust_id': [10, 11, 10],
'amount': [250, 120, 300]
})
customers = pd.DataFrame({
'cust_id': [10, 11],
'city': ['Pune', 'Delhi']
})Note: Output: (No output — just two tables. Orders has a custid but no city; customers maps each custid to a city. The shared column cust_id is our key.)
Join them on the key
We tell merge which column to match on with on='cust_id'. Pandas finds each order’s customer and brings in their city.
combined = orders.merge(customers, on='cust_id')
print(combined)Note: Output: orderid custid amount city 0 1 10 250 Pune 1 2 11 120 Delhi 2 3 10 300 Pune Each order now carries the customer’s city. Orders 1 and 3 both belong to customer 10, so both show Pune. We could now answer city questions with groupby.
Inner vs left joins
What if an order points to a customer not in the customers table? The how option decides what happens to unmatched rows.
| how= | Keeps | Use when |
|---|---|---|
'inner' (default) | Only rows that match in both | You want clean, matched data |
'left' | All left rows; blanks if no match | You must keep every order |
'right' | All right rows | You must keep every customer |
'outer' | Everything from both | You want nothing dropped |
Watch out: A default (inner) merge silently drops rows that do not match on the key. If your combined table has fewer rows than expected, unmatched keys are usually why — try how='left' to keep them all.
Tip: The key column must mean the same thing in both tables (here, cust_id). If the columns have different names, use left_on and right_on to say which is which.
Q. What does the on argument in orders.merge(customers, on='cust_id') specify?
✍️ Practice
- Create a
productstable and asalestable sharing aproduct_id, then merge them. - Change one
cust_idinordersto a value not incustomers, then compare an inner merge with ahow='left'merge.
🏠 Homework
- Find or make two related CSVs (e.g. orders and customers), merge them on a shared key, and write one insight you could only get after combining them.