Schema Design: Embed vs Reference
The single most important real-world MongoDB skill: deciding how to shape your documents so the database stays fast and your code stays simple.
What you will learn
- Choose between embedding and referencing for related data
- Model one-to-one, one-to-many and many-to-many relationships
- Recognise common schema anti-patterns and avoid them
The big question: together or apart?
Every app has related data — an order has items, a post has comments, a user has an address. In MongoDB you have two ways to store related data, and choosing well is the skill that separates someone who “can save documents” from someone who can design a real database.
- Embedding — nest the related data inside the parent document, as a sub-object or an array. One document holds everything.
- Referencing — store the related document’s
_idin a separate collection (this is whatpopulatefollows). The data lives apart and is linked.
Here is the same blog post modelled both ways so you can see the difference:
// EMBEDDING — comments live inside the post
{
_id: 1,
title: "Hello MongoDB",
comments: [
{ user: "Asha", text: "Great post!" },
{ user: "Ravi", text: "Thanks!" }
]
}
// REFERENCING — comments are their own documents, linked by postId
// posts collection
{ _id: 1, title: "Hello MongoDB" }
// comments collection
{ _id: 11, postId: 1, user: "Asha", text: "Great post!" }
{ _id: 12, postId: 1, user: "Ravi", text: "Thanks!" }With embedding, one read of the post brings back its comments too — fast and simple, because the data that is read together lives together. With referencing, the post and its comments are separate documents you join later (with populate or $lookup). Embedding is usually the better default in MongoDB; referencing is for the specific cases below.
The decision rule
A simple guide that covers most situations: embed data you read together and that belongs only to its parent; reference data that is shared, large, or grows without limit.
| Embed when… | Reference when… |
|---|---|
| Data is read together with the parent | Data is queried on its own as well |
| Data belongs only to this parent | Data is shared by many parents |
| The array stays small and bounded | The list could grow without limit |
| Example: an order’s line items | Example: a post’s author (a User) |
Two quick worked examples. An order’s line items belong only to that order, are always read with it, and there are only a handful — so embed them as an array inside the order. A post’s author is a User who has their own profile, their own posts, and is shared across the app — so reference the user by _id (you saw this exact case in the populate lesson).
The three relationship shapes
Almost every relationship is one of three shapes. Here is how each maps to embed-or-reference:
- One-to-one (a user and their profile settings) — usually embed: just nest the settings object inside the user. One read gets everything.
- One-to-many (a post and its comments, an order and its items) — embed if the “many” is small and bounded (a few comments), reference if it can grow without limit (a celebrity post with 50,000 comments would make one giant document).
- Many-to-many (students and the courses they enrol in) — reference: keep an array of
courseIdvalues on the student (or student ids on the course). Each side links to the other by id.
// One-to-many done by REFERENCE (comments could grow large)
// post document
{ _id: 1, title: "Hello", author: ObjectId("...u1") }
// many-to-many: a student holds an array of course ids
{ _id: "s1", name: "Asha", courses: [ObjectId("c1"), ObjectId("c2")] }Note: Output (conceptual): there is no command output here — the “result” is a data model. The student document links to two courses by their ids, and you would use $lookup or populate to pull the full course details when you need them.
Anti-patterns to avoid
A few mistakes that bite people in real apps:
- Massive, unbounded arrays — embedding something that grows forever (every comment, every log entry) makes documents huge and slow. MongoDB documents also have a 16 MB size limit. Reference instead.
- Over-referencing out of SQL habit — splitting everything into separate collections like a relational database means you constantly join data that is always read together. Embed what belongs together.
- Duplicating data you must keep in sync — copying a user’s name into every order is fine for a historical snapshot, but if you must update it everywhere when it changes, reference it instead.
Tip: Design for the queries your app actually runs, not for theoretical neatness. Ask “what does the screen need to show, and in how few reads can I get it?” — then shape your documents to make that easy.
Q. A blog post can receive an unlimited number of comments. How should you store the comments?
✍️ Practice
- For a shopping app, decide embed-or-reference for: a product’s reviews, an order’s line items, and a product’s category. Justify each.
- Sketch documents for a many-to-many relationship between students and courses using id arrays.
🏠 Homework
- Take an app idea of your own and write out its full data model, noting for every relationship whether you embed or reference and why.