Skip to content

Message Queues — Theory

Message Queues — Theory (interview deep-dive)

Section titled “Message Queues — Theory (interview deep-dive)”
  • At most once — fire and forget. Messages may be lost.
  • At least once — retried until ack. May see duplicates. Most queues default.
  • Exactly once — only achievable end-to-end with idempotent consumers + dedupe IDs. Brokers can offer “exactly-once delivery” within their boundaries (Kafka transactions, SQS FIFO dedup window) but external side effects still need consumer-side dedupe.
  • Global order is rarely possible at scale.
  • Per-partition / per-queue / per-message-group order is achievable.
  • Choose the unit of ordering = entity that needs strict order (e.g., userId for user events, orderId for order state).

If 100% strict order with 100% throughput → no system gives you both. Trade.

  • Producer faster than consumer = queue grows.
  • Strategies:
    • Bounded queue + drop / block producer.
    • Auto-scale consumers up to partition count.
    • Slow producer via API rate limit.
    • Shed load (return 503 to upstream).

A growing lag without action is a path to outage.

A message that always fails. Without DLQ, it blocks the queue forever or burns CPU.

Pattern:

  1. Try N times (with backoff).
  2. After N, route to DLQ topic/queue.
  3. Alert on DLQ depth > 0.
  4. Manual inspection → fix → replay.

In SQS: RedrivePolicy { maxReceiveCount: 5, deadLetterTargetArn }. In RabbitMQ: dead-letter exchange + queue. In Kafka: write to *.dlq topic from consumer code.

Network retries + at-least-once means you’ll see duplicates. Must process safely:

  • Dedupe by messageId (inbox table).
  • Conditional updates (UPDATE WHERE state='pending').
  • Use natural keys where possible (INSERT ... ON CONFLICT DO NOTHING).
  • Side effects (HTTP, email): wrap in idempotency window or accept rare duplicates.

Solves dual-write between DB + queue:

  1. In same DB transaction, write business state + insert into outbox table.
  2. Relay process publishes outbox rows; marks published.
  3. Consumer dedupes by message id (inbox).

Without outbox, you can lose events when DB commits but publish fails.

  • Each queue lives on one node (mirrored adds replicas; quorum queues use Raft).
  • Connection-per-process is heavyweight; use channels (lightweight) within one connection.
  • Prefetch (basic.qos) limits in-flight per consumer — set to small number (10-50) for fairness.
  • tx.select is slow; use publisher confirms instead.
  • Persistent message + durable queue + publisher confirm + manual ack = strong durability.
  • Visibility timeout is critical: must process and delete within window or message reappears. Tune to longer than worst-case processing time.
  • Long polling vs short polling — always use long polling (20s) to reduce cost and latency.
  • Standard queue may redeliver and reorder — must design for it.
  • FIFO TPS limit per group is real — design MessageGroupId accordingly.
  • Cost is per request — batching helps.
  • MAXLEN ~ N to cap memory.
  • Consumer group + XREADGROUP with > reads new only.
  • XCLAIM after min-idle-time for stuck messages.
  • AOF needed for durability; RDB-only loses unflushed.
  1. You see message lag growing — debug. More consumers? Slow processing? Downstream slow? Hot partition?
  2. Two consumers got the same message — why and how to handle? Visibility timeout expired (SQS) or redelivery on broker restart. Idempotent consumer.
  3. Need strict order for user X’s events. Use partition / message group keyed by user id.
  4. A broker is down — what happens? Producer either errors immediately or buffers. Decide based on SLA. Many brokers have replicated/HA setups.
  5. How would you migrate from RabbitMQ to Kafka? Run both, dual publish from producers, switch consumers, eventually drop old.
  6. When to NOT use a queue? Synchronous user-facing op; tiny event volumes (DB row + cron is enough); strict global ordering at huge scale.
  7. Difference between fanout and pub/sub? Same idea (broadcast). Fanout exchange in RMQ; multiple SQS queues subscribed to one SNS topic.
  8. What’s the difference between a topic in Kafka vs RabbitMQ? Kafka topic = partitioned log replayed by offset. RabbitMQ topic = exchange type for routing keys.
  9. DLQ pattern: where does the retry counter live? Header on the message; broker-incremented in some setups.
  10. Why might you avoid Kafka exactly-once for cross-system writes? EOS only inside Kafka — DB writes still need idempotency.
  • Already on AWS? Standard queue with at-least-once works → SQS.
  • Need replay + analytics + huge volume → Kafka.
  • Need rich routing patterns and per-message ack → RabbitMQ.
  • Lightweight job queue, already on Redis → Redis Streams / BullMQ.
  • Edge / IoT / mixed messaging → NATS JetStream.
  • Multi-tenant geo-replicated → Pulsar.
  • Number of consumers ≤ partitions / queues sharded.
  • Visibility timeout = 2-3× p99 processing time.
  • Prefetch / poll batch size depends on per-msg processing time × parallelism.
  • Backlog growth rate × retention time = max disk needed.
  • Using a queue as DB.
  • Per-message DB connection (use pool).
  • No DLQ.
  • Letting one slow message block all others (unbounded retries).
  • Mixing message types in one queue without versioning.
  • Synchronous request-reply over a queue when HTTP would do.
  • Skipping idempotency assuming “exactly once” delivery.