Going Deeper: Production MongoDBExtra· 50 min read

The Aggregation Pipeline in Depth

Turn raw documents into reports and analytics by passing them through a pipeline of stages — group, filter, reshape, join and sort.

What you will learn

  • Build multi-stage pipelines with $match, $group, $sort and $project
  • Join collections with $lookup and flatten arrays with $unwind
  • Use accumulators like $sum, $avg, $min, $max and $push

What aggregation is for

A plain find returns documents as they are stored. But dashboards and reports need computed answers: total sales per month, average rating per product, the top five customers. The aggregation pipeline is MongoDB’s tool for exactly this — it takes your documents and pushes them through a series of stages, each one transforming the data a little, until the final stage produces your report.

Picture a conveyor belt in a factory. Documents ride along it, and each stage is a station that does one job — filter some out, group them, reshape them, sort them. The output of one station becomes the input of the next. You list the stages in order inside an array passed to aggregate([ ... ]).

StageWhat it does
$matchKeeps only documents that match a filter (like find)
$groupBuckets documents by a key and computes totals/averages
$projectChooses and reshapes which fields come out
$sortOrders the results
$lookupJoins in documents from another collection
$unwindSplits an array field into one document per element

A worked example: sales report

Imagine an orders collection where each document is one order: { customer: "Asha", product: "Mouse", amount: 800, status: "paid" }. We want the total amount each customer has paid, biggest spender first. Three stages do it:

Match, then group, then sort — a real report
db.orders.aggregate([
  { $match: { status: "paid" } },
  { $group: { _id: "$customer", totalSpent: { $sum: "$amount" } } },
  { $sort: { totalSpent: -1 } }
])

Reading the belt left to right. $match keeps only orders with status of "paid" (doing this first means later stages handle less data — faster). $group then buckets the remaining orders by customer — the _id: "$customer" sets the grouping key, and the "$customer" with a dollar sign means “the value of the customer field”. For each bucket, $sum: "$amount" adds up every order’s amount into a new field called totalSpent. Finally $sort: { totalSpent: -1 } orders the customer totals from highest to lowest.

Note: Output: [ { _id: 'Asha', totalSpent: 2400 }, { _id: 'Ravi', totalSpent: 1500 }, { _id: 'Meera', totalSpent: 600 } ] Each row is one customer and everything they have paid, biggest spender first — a finished report, computed by the database.

Common accumulators you use inside $group: $sum (add up, or count with $sum: 1), $avg (average), $min and $max (smallest/largest), and $push (collect values into an array). For example avgRating: { $avg: "$rating" } gives the average rating per group.

Joining collections with $lookup

Orders store a customerId, but the customer’s name lives in a separate customers collection. $lookup joins them — it is MongoDB’s version of a SQL join. $unwind then flattens the joined array so each result has a single matched customer object instead of a one-item list.

Join orders to customers, then keep only what you need
db.orders.aggregate([
  { $lookup: {
      from: "customers",
      localField: "customerId",
      foreignField: "_id",
      as: "customer"
  }},
  { $unwind: "$customer" },
  { $project: { _id: 0, product: 1, name: "$customer.name" } }
])

Step by step. $lookup says: for each order, look in the customers collection (from), find the document whose _id (foreignField) equals this order’s customerId (localField), and attach it under a new field named customer (as). Because a lookup always returns an array of matches, $unwind: "$customer" unpacks that one-item array into a plain object. Finally $project chooses the output shape: hide _id, keep product, and pull the matched customer’s name up to a clean name field with "$customer.name".

Note: Output: [ { product: 'Mouse', name: 'Asha' }, { product: 'Keyboard', name: 'Ravi' } ] Each order now carries its customer’s real name, joined in from the other collection — exactly like a SQL join, but done with pipeline stages.

A reliable way to build any pipeline:

  1. Write down the report you want in one plain sentence (“total paid per customer, biggest first”).
  2. Add a $match stage first to throw away documents you do not need — this keeps later stages fast.
  3. Add a $group stage to bucket and compute totals/averages with accumulators.
  4. Add $lookup (and $unwind) if you need fields from another collection.
  5. Finish with $project to shape the output and $sort to order it.
  6. Build it one stage at a time, running it after each addition so you can see what each stage produced.

Tip: In Mongoose you run the same pipeline with Model.aggregate([ ... ]) — the stages are identical, so everything here transfers straight into your MERN back-end.

Q. Which aggregation stage buckets documents together and computes a total or average for each bucket?

Answer: $group buckets documents by a key (_id) and uses accumulators like $sum and $avg to compute a value per bucket. $match filters, $sort orders, $project reshapes.

✍️ Practice

  1. Write a pipeline that uses $match then $group to count documents per category, sorted from most to fewest.
  2. Write a pipeline that uses $group with $avg to find the average price per category.

🏠 Homework

  1. On an orders collection, build a pipeline that joins customers with $lookup, unwinds the result, and reports each order with the customer’s name and the order amount.
Want to learn this with a mentor?

CodingClave runs guided, project-based training (28-day, 45-day & 6-month batches).

Explore Training →