MongoDB — Theory

MongoDB — Theory (interview deep-dive)

When document model wins / loses

Wins:

Polymorphic / sparse data — fields vary per record.
Hierarchical data accessed together (orders w/ line items, blog posts w/ comments).
High write throughput on simple CRUD.
Rapid iteration when schema isn’t settled.

Loses:

Complex multi-entity transactions and joins.
Strong relational integrity / FK constraints.
Reporting / analytics with arbitrary joins.
When data fundamentally is relational.

Schema design — embed vs reference

Embed when:

Data accessed together always.
Child cannot exist independently.
Subdoc count is bounded.
Read-heavy.

Reference when:

Many-to-many relationships.
Subdoc grows unbounded (comments → millions).
Subdoc accessed independently.
Write contention high (entire doc rewrites).

Patterns: subset pattern (embed N most recent, reference rest), bucket pattern (group time-series), computed pattern (precompute aggregates).

Storage engine — WiredTiger

Default since 3.2. Document-level locking, snapshot isolation, MVCC.
Compression: snappy (default), zlib, zstd.
Cache: 50% of RAM minus 1GB.
Checkpoints every 60s. Journal for durability between checkpoints.

Replica sets — internals

Primary accepts writes, replicates to secondaries via oplog (capped collection in local DB).
Secondaries pull oplog, apply ops idempotently.
Election: secondary detects primary down → starts election → majority votes → new primary. Raft-like protocol since 3.2.
Need majority of voting members alive to elect / accept writes.
Read from secondary: eventual consistency unless using majority read concern.

Write concern & read concern

Write concern w: 'majority', j: true: ack only when majority replicated AND journaled. Production default for important data.
Write concern w: 1: primary ack only. Risk: rolled back on failover.
Read concern majority: read data acked by majority, can’t be rolled back.
Read concern linearizable: wait until all writes seen by all members. Slow.

Sharding deep-dive

Each shard is a replica set.
Chunks of contiguous shard key range, default 128MB. Balancer moves chunks.
Choosing shard key:
- Cardinality: many distinct values.
- Frequency: low — no value dominates.
- Monotonicity: avoid (timestamps, ObjectId) → all writes hit one shard. Use hashed instead.
- Query alignment: queries should include shard key for routed (single-shard) queries; otherwise scatter-gather.
Cannot change shard key easily (5.0+ refinable, but limited).
Zone sharding: pin chunks to specific shards (e.g., regional data residency).

Indexes — gotchas

Index intersection rare; usually single index used per query.
Compound index: leftmost-prefix wins. {a:1, b:1, c:1} covers {a}, {a,b}, {a,b,c}, NOT {b} or {b,c}.
ESR rule: Equality, then Sort, then Range field order in compound.
Multikey index can’t include another array field.
Each write updates every index — too many = slow writes.
Index builds: background: true (default in 4.2+), or rolling builds in replica sets.

Aggregation tips

Push $match and $project early to reduce data through pipeline.
$lookup can be slow if foreign collection unindexed — index the joined field.
Use $facet to compute multiple aggregations in one pass.
Allow disk use for big sorts: { allowDiskUse: true }.
View execution stats: db.coll.aggregate([...]).explain('executionStats').

Transactions

Multi-doc since 4.0 (replica) / 4.2 (sharded).
Costly — runs at snapshot isolation, requires majority write concern, default 60s timeout.
Best practice: design data to avoid needing them (single-doc atomic updates, embed).
Retry on TransientTransactionError.

Common interview Qs

When NOT to use Mongo? Heavy joins, strict ACID across collections, relational reporting.
How do you model 1:N where N is unbounded (e.g. user → posts)? Reference, not embed.
Why is $skip for pagination slow at large offsets? Linear scan. Use range queries on indexed sort key (keyset).
How does Mongo guarantee durability? Journal (write-ahead log), then checkpoint to data files.
What’s a covered query? All fields needed are in index — no document fetch.
Replica set with 3 nodes: 1 dies — still writable? Yes (2/3 majority). 2 die — read-only.
How to do a zero-downtime schema change? Add new field, dual-write, backfill, switch reads, drop old.
Sharding with date-based key — what’s wrong? Monotonic — hot shard. Use hashed.
Why would $lookup be discouraged in OLTP path? Latency unpredictable; no index intersection. Better to embed or denormalize.

Compared to PostgreSQL

	PostgreSQL	MongoDB
Schema	strict, evolvable	flexible
Joins	native, fast	`$lookup`, slower
Transactions	mature, fast	mature 4.0+, costlier
Indexing	rich (GIN, partial, expr, BRIN)	rich (compound, multikey, text, geo)
Replication	streaming + logical	replica set, automatic failover
Sharding	manual / Citus	native
JSONB	indexable, queryable	native model

PG with jsonb covers many Mongo use cases. Pick Mongo when document model + automatic sharding are first-class needs.

Deep dive — ACID in MongoDB

MongoDB became ACID for single documents always, multi-document transactions on replica sets in 4.0 (June 2018), and distributed multi-document transactions on sharded clusters in 4.2.

Transactions tie to a session, default to a 60-second timeout, and prefer to be single-shard — cross-shard transactions use 2-phase commit and are materially slower. The recommended callback API (session.withTransaction) auto-retries TransientTransactionError and UnknownTransactionCommitResult.

Defaults are weak. MongoDB transactions inherit readConcern:"local" and w:1 from the operation. You must set readConcern:"snapshot" + writeConcern:"majority" per transaction for true snapshot isolation. Jepsen 4.2.6 (2020) found read-skew/G1c anomalies even then, later attributed to a retry bug (SERVER-48307).

Retryable writes (separate from transactions, default since 3.6) automatically retry single-document writes that hit network errors — they make at-most-once delivery practical on flaky networks.

Gotchas

MongoDB transactions default to readConcern:"local" + w:1 — they ignore collection/db-level safety settings. Set them per transaction.
readConcern:"snapshot" only gives true snapshot isolation when paired with writeConcern:"majority".
Default Mongo transaction timeout is 60 seconds; long-running ones abort.
Mongo cross-shard transactions are slow (2PC) — design schema so transactions are single-shard whenever possible.

Q: Is MongoDB ACID?

Yes, since 4.0 for replica-set multi-doc transactions and 4.2 for sharded clusters, but with caveats: weak defaults, snapshot isolation requires explicit snapshot + majority write concern, and Jepsen 4.2.6 found read-skew/G1c anomalies later attributed to a retry bug. Single-document writes have always been atomic.

Q: How would you implement money transfer in MongoDB?

Either single-document (embed both balances — fine for joint accounts), or a multi-doc transaction with readConcern:"snapshot", writeConcern:"majority", and the callback API for retries. Make sure both accounts live on the same shard (same shard-key prefix) to avoid cross-shard 2PC. For high throughput, prefer event-sourced ledgers in Postgres.

Sources: mongodb.com/docs transactions, Jepsen MongoDB 4.2.6.

Deep dive — ESR rule (Equality, Sort, Range)

MongoDB’s compound-index design rule is ESR: list Equality predicates first, then Sort fields, then Range predicates.

Reasoning:

Equality keys collapse the index scan to a contiguous range immediately.
Sort keys placed after equality keys let the index serve the sort directly (no in-memory SORT stage).
Range keys placed last let the scan terminate cleanly without breaking the sort.

Reverse the order and you either re-sort in memory or scan more index entries than necessary.

When $in is combined with .sort(), fewer than 201 elements are “exploded” into separate index scans merged via SORT_MERGE (equality-like for ESR); ≥201 elements falls back to a range-style scan and the sort is no longer index-served. $ne, $nin, $regex, $gt/$gte/$lt/$lte are ranges.

Index types

Type	Notes
Single-field	Default
Compound	Up to 32 fields, leftmost-prefix rule
Multikey	Created automatically when indexed field is an array. Compound index can contain at most one array field
Text	Full-text, one per collection
2dsphere	Geospatial on GeoJSON
Wildcard (`{ "$**": 1 }`)	Indexes all fields. Heavy on writes. Can’t sort, can’t be shard key
Hashed	Used for sharding to break monotonic-key hot spots. Only one hashed field per compound index

Index intersection exists but the planner rarely picks it over a well-designed compound index — don’t rely on it.

Gotchas

Putting a range field before sort = in-memory sort (and a 32 MB cap unless allowDiskUse).
Compound indexes hold at most one array field — multikey can’t combine two arrays.
$in flips between equality and range at 201 elements — large $in lists silently degrade ESR plans.
Wildcard indexes don’t support sorting and can’t be used as shard keys; they’re heavy on writes.
If you have both {a:1} and {a:1, b:1}, drop {a:1} — the compound serves all a-prefix queries.

Q: Walk me through ESR with an example.

Query: find all orders for tenantId=42 with total > 100, sorted by createdAt. Index: { tenantId:1, createdAt:1, total:1 }. tenantId (equality) shrinks to one tenant’s keys; createdAt (sort) is already ordered in the index, so no in-memory sort; total (range) filters at scan time and stops cleanly. Reversing to {tenantId:1, total:1, createdAt:1} would force MongoDB to sort in memory.

Q: When does ESR break down?

When the range predicate is so selective it cuts the working set far more than the sort matters. Then ERS (range before sort) wins because you sort fewer documents — but you eat an in-memory sort. Verify with explain("executionStats") comparing totalKeysExamined, totalDocsExamined, and whether a SORT stage appears.

Sources: mongodb.com ESR rule, mongodb.com compound indexes.

Deep dive — aggregation pipeline

The aggregation pipeline is a sequence of stages where each stage’s output is the next stage’s input.

Common stages

Stage	Purpose
`$match`	Filter — push as early as possible to use indexes
`$project` / `$addFields` / `$set`	Reshape/compute
`$group`	Aggregate by key; needs all data in memory unless `allowDiskUse:true` (100 MB per-stage limit)
`$sort`	Uses an index if first stage or follows index-using `$match`
`$limit` / `$skip`
`$lookup`	Left-outer join from another collection
`$unwind`	Deconstruct arrays into multiple documents
`$facet`	Run several sub-pipelines on the same input — “search results + counts + facets”
`$bucket` / `$bucketAuto`	Histograms
`$graphLookup`	Recursive lookup (org chart)
`$merge` / `$out`	Write results back to a collection. `$merge` upserts, `$out` replaces

Pipeline optimization

The query planner does meaningful pipeline optimization:

Reorders $match ahead of $project/$addFields when filters don’t depend on computed fields.
Moves $match ahead of $sort.
Coalesces adjacent $limits (takes the smaller), sums adjacent $skips.
Fuses $sort + $limit into a top-N with bounded memory.
Coalesces $lookup + $unwind + $match into a single $lookup with an inner pipeline to avoid materializing huge intermediate arrays.

Since 5.2 / 6.0 the slot-based execution engine (SBE) accelerates $group and certain $lookup shapes. The biggest practical pitfall is $lookup on an unindexed foreign key on the foreign collection — it becomes O(N×M).

Gotchas

$lookup without an index on the foreign field’s join key is O(N×M) — always index it.
$unwind after $lookup can multiply documents enormously; use $lookup’s inline pipeline + $unwind immediately so the planner can coalesce.
$group is memory-bound (100 MB per stage) — use allowDiskUse:true for big cardinalities.
$out replaces the target collection atomically — wrong target = wiped collection. Prefer $merge for incremental updates.
$sort only uses an index if it can run as the first stage or directly follows an index-using $match; once a $group/$project/$unwind has run, sorts go to memory/disk.

Q: How do you make a slow $lookup fast?

Index the foreign collection on the join field (and any inner-pipeline filter fields, ESR-style). Push filtering into the $lookup’s pipeline so the planner coalesces with the following $unwind/$match. If the foreign collection is huge, consider denormalizing — embedding a snapshot of the joined fields and updating them on writes is the Mongo-idiomatic answer.

Q: Read this explain — what stages should I look for?

winningPlan.stage of IXSCAN (good) vs COLLSCAN (bad). totalKeysExamined close to nReturned (selective). No SORT stage if you indexed correctly. For $lookup, the new EQ_LOOKUP stage means SBE accelerated it. Watch executionTimeMillis per stage.

Sources: mongodb.com aggregation-pipeline-optimization, MongoDB University M121.

Deep dive — sharding details

MongoDB is built around the replica set: one primary, multiple secondaries, automatic failover via a Raft-like election (must have a majority of voting members up).

Sharding distributes a collection across shards by shard key — chosen up front and historically immutable, though 4.4 added refinable shard keys (you can append fields) and 5.0 added full resharding.

Three shard-key properties matter:

High cardinality — else few chunks.
Low frequency of any single value — else jumbo chunks.
Non-monotonic — else the active chunk is always on one shard (hot shard).

Hashed shard keys solve the monotonic problem at the cost of range-query locality.

Gotchas

Mongo shard-key choice is the single biggest performance decision and was historically immutable — 4.4 added refinable, 5.0 added resharding, but resharding is expensive on large collections.
Monotonic shard keys (_id ObjectId, timestamps, sequences) cause hot-shard write bottlenecks — use hashed sharding or a compound key.
Reading from secondaries gives stale data; for read-your-writes, use readConcern:"majority" and writeConcern:"majority" together, or read from primary.

Q: I’ve sharded on _id (an ObjectId) and writes are bottlenecked on one shard. Why?

ObjectId is monotonically increasing (timestamp + counter), so all current writes target the chunk holding the max range, which lives on one shard. The fix is hashed sharding on _id (or the natural key), or a compound shard key whose leading field has high enough cardinality to spread writes (e.g. {tenantId:1, _id:1}). Since 5.0 you can reshardCollection without downtime, but it’s expensive — prefer to get this right initially.

Sources: mongodb.com shard-key, Kleppmann DDIA ch. 5–6.