Message Queues — Theory
Message Queues — Theory (interview deep-dive)
Section titled “Message Queues — Theory (interview deep-dive)”Delivery semantics
Section titled “Delivery semantics”- At most once — fire and forget. Messages may be lost.
- At least once — retried until ack. May see duplicates. Most queues default.
- Exactly once — only achievable end-to-end with idempotent consumers + dedupe IDs. Brokers can offer “exactly-once delivery” within their boundaries (Kafka transactions, SQS FIFO dedup window) but external side effects still need consumer-side dedupe.
Ordering
Section titled “Ordering”- Global order is rarely possible at scale.
- Per-partition / per-queue / per-message-group order is achievable.
- Choose the unit of ordering = entity that needs strict order (e.g.,
userIdfor user events,orderIdfor order state).
If 100% strict order with 100% throughput → no system gives you both. Trade.
Backpressure & flow control
Section titled “Backpressure & flow control”- Producer faster than consumer = queue grows.
- Strategies:
- Bounded queue + drop / block producer.
- Auto-scale consumers up to partition count.
- Slow producer via API rate limit.
- Shed load (return 503 to upstream).
A growing lag without action is a path to outage.
Poison messages & DLQ
Section titled “Poison messages & DLQ”A message that always fails. Without DLQ, it blocks the queue forever or burns CPU.
Pattern:
- Try N times (with backoff).
- After N, route to DLQ topic/queue.
- Alert on DLQ depth > 0.
- Manual inspection → fix → replay.
In SQS: RedrivePolicy { maxReceiveCount: 5, deadLetterTargetArn }.
In RabbitMQ: dead-letter exchange + queue.
In Kafka: write to *.dlq topic from consumer code.
Idempotency in consumers
Section titled “Idempotency in consumers”Network retries + at-least-once means you’ll see duplicates. Must process safely:
- Dedupe by
messageId(inbox table). - Conditional updates (
UPDATE WHERE state='pending'). - Use natural keys where possible (
INSERT ... ON CONFLICT DO NOTHING). - Side effects (HTTP, email): wrap in idempotency window or accept rare duplicates.
Outbox pattern (revisited for queues)
Section titled “Outbox pattern (revisited for queues)”Solves dual-write between DB + queue:
- In same DB transaction, write business state + insert into
outboxtable. - Relay process publishes outbox rows; marks published.
- Consumer dedupes by message id (inbox).
Without outbox, you can lose events when DB commits but publish fails.
RabbitMQ deep notes
Section titled “RabbitMQ deep notes”- Each queue lives on one node (mirrored adds replicas; quorum queues use Raft).
- Connection-per-process is heavyweight; use channels (lightweight) within one connection.
- Prefetch (
basic.qos) limits in-flight per consumer — set to small number (10-50) for fairness. tx.selectis slow; use publisher confirms instead.- Persistent message + durable queue + publisher confirm + manual ack = strong durability.
SQS deep notes
Section titled “SQS deep notes”- Visibility timeout is critical: must process and delete within window or message reappears. Tune to longer than worst-case processing time.
- Long polling vs short polling — always use long polling (20s) to reduce cost and latency.
- Standard queue may redeliver and reorder — must design for it.
- FIFO TPS limit per group is real — design
MessageGroupIdaccordingly. - Cost is per request — batching helps.
Redis Streams notes
Section titled “Redis Streams notes”MAXLEN ~ Nto cap memory.- Consumer group +
XREADGROUPwith>reads new only. XCLAIMaftermin-idle-timefor stuck messages.- AOF needed for durability; RDB-only loses unflushed.
Common interview Qs
Section titled “Common interview Qs”- You see message lag growing — debug. More consumers? Slow processing? Downstream slow? Hot partition?
- Two consumers got the same message — why and how to handle? Visibility timeout expired (SQS) or redelivery on broker restart. Idempotent consumer.
- Need strict order for user X’s events. Use partition / message group keyed by user id.
- A broker is down — what happens? Producer either errors immediately or buffers. Decide based on SLA. Many brokers have replicated/HA setups.
- How would you migrate from RabbitMQ to Kafka? Run both, dual publish from producers, switch consumers, eventually drop old.
- When to NOT use a queue? Synchronous user-facing op; tiny event volumes (DB row + cron is enough); strict global ordering at huge scale.
- Difference between fanout and pub/sub? Same idea (broadcast). Fanout exchange in RMQ; multiple SQS queues subscribed to one SNS topic.
- What’s the difference between a topic in Kafka vs RabbitMQ? Kafka topic = partitioned log replayed by offset. RabbitMQ topic = exchange type for routing keys.
- DLQ pattern: where does the retry counter live? Header on the message; broker-incremented in some setups.
- Why might you avoid Kafka exactly-once for cross-system writes? EOS only inside Kafka — DB writes still need idempotency.
Choosing — a quick decision tree
Section titled “Choosing — a quick decision tree”- Already on AWS? Standard queue with at-least-once works → SQS.
- Need replay + analytics + huge volume → Kafka.
- Need rich routing patterns and per-message ack → RabbitMQ.
- Lightweight job queue, already on Redis → Redis Streams / BullMQ.
- Edge / IoT / mixed messaging → NATS JetStream.
- Multi-tenant geo-replicated → Pulsar.
Sizing rules of thumb
Section titled “Sizing rules of thumb”- Number of consumers ≤ partitions / queues sharded.
- Visibility timeout = 2-3× p99 processing time.
- Prefetch / poll batch size depends on per-msg processing time × parallelism.
- Backlog growth rate × retention time = max disk needed.
Anti-patterns
Section titled “Anti-patterns”- Using a queue as DB.
- Per-message DB connection (use pool).
- No DLQ.
- Letting one slow message block all others (unbounded retries).
- Mixing message types in one queue without versioning.
- Synchronous request-reply over a queue when HTTP would do.
- Skipping idempotency assuming “exactly once” delivery.