PandasCore· 35 min read

Group & Aggregate with groupby

Answer “per category” questions — total sales per region, average price per product — by grouping and summarising.

What you will learn

  • Group rows by a column
  • Aggregate each group (sum, mean, count)
  • Read a grouped result

The “per category” question

So many real questions sound like “… per …”: sales per region, average score per class, orders per customer. The tool for all of them is groupby. It splits the table into groups, then summarises each group into one number.

Think of it as three steps — often called split–apply–combine: split the rows into groups, apply a summary to each group, and combine the answers into a small table.

Picture sorting a deck of cards into suits, then counting each pile. Sorting into piles is the split; counting each pile is the apply; writing the four totals on one line is the combine. groupby does all three for you.

Total amount per region
import pandas as pd

df = pd.DataFrame({
    'region': ['North', 'South', 'North', 'South', 'North'],
    'amount': [250, 120, 130, 850, 300]
})

print(df.groupby('region')['amount'].sum())

Note: Output: region North 680 South 970 Name: amount, dtype: int64 Pandas split the rows by region, then summed the amount in each. North totals 680 (250+130+300), South totals 970 (120+850).

Other summaries

Swap .sum() for .mean(), .count(), .max() — whatever question you are asking.

Average and count per group
print(df.groupby('region')['amount'].mean())   # average per region
print('---')
print(df.groupby('region')['amount'].count())  # how many sales per region

Note: Output: region North 226.666667 South 485.000000 Name: amount, dtype: float64 --- region North 3 South 2 Name: amount, dtype: int64 North averages about 227 across its 3 sales; South averages 485 across 2 sales. Same groups, different question.

You want…AggregationQuestion it answers
Total.sum()Sales per region
Average.mean()Average price per product
Count.count()How many orders per customer
Biggest.max()Top sale per region

Tip: The pattern is always the same: df.groupby(WHAT_TO_GROUP_BY)[WHICH_COLUMN].SUMMARY(). Read it as “for each region, take the amount, and sum it”.

Watch out: A bare df.groupby('region') does nothing visible on its own — you must tell it which column to summarise and how (e.g. ['amount'].sum()), or it just returns a grouping object.

Q. What does df.groupby('region')['amount'].mean() calculate?

Answer: groupby('region') makes one group per region, and .mean() gives the average amount within each of those groups.

✍️ Practice

  1. Add a product column to the table and compute the total amount per product.
  2. Use groupby to find the average amount per region, then the count per region.

🏠 Homework

  1. On a real CSV, pick a category column and a number column, and answer one “per category” question (e.g. average price per category). Write the result in a sentence.
Want to learn this with a mentor?

CodingClave runs guided, project-based training (28-day, 45-day & 6-month batches).

Explore Training →