Kafka — Theory

Apache Kafka — Theory (interview deep-dive)

Why is Kafka fast?

Sequential disk I/O — append-only log, no random writes.
Page cache — OS handles caching; Kafka rarely calls fsync per record.
Zero-copy (sendfile) — kernel sends data from page cache directly to socket without user-space copy.
Batching + compression at producer.
Pull model — consumer paces itself; no broker-side per-consumer state for delivery.
Partitioning — horizontal parallelism.

Partition — design implications

More partitions = more parallelism, but:
- More open file handles per broker.
- Slower leader election if many partitions move during failover.
- Higher producer memory (one batch buffer per partition).
- Rebalance time grows with partition count.
Common range: hundreds per topic, thousands per cluster.
You can grow partitions but it changes key→partition mapping — breaks ordering for existing keys. Plan ahead.

Consumer group rebalance

Triggers: consumer joins / leaves, heartbeat timeout, partition added.

Eager rebalance (legacy) — all consumers stop, reassign, resume. Stop-the-world.
Cooperative (incremental) — only affected partitions reassigned. Less disruption.
Tunable: session.timeout.ms (broker considers consumer dead), max.poll.interval.ms (consumer must poll within this), heartbeat.interval.ms.

Exactly-once semantics

Kafka’s exactly-once is read-process-write within Kafka:

consume → process → produce → commit offsets

All of those (produce + commit offsets) wrapped in a Kafka transaction. Either everything commits or nothing.

Requirements:

enable.idempotence=true
isolation.level=read_committed on consumer
transactional.id on producer

Limit: external side effects (DB write, HTTP call, email) are NOT inside the transaction. For those, design the consumer to be idempotent (use message id or transactional outbox on DB side).

Idempotent producer

Producer assigned a producer id (PID); each message gets a sequence number per partition.
Broker dedupes retries by (PID, partition, sequence).
Default in 3.0+. Cheap; turn off only if you have a strong reason.

Replication & durability

replication.factor per topic (commonly 3).
min.insync.replicas — minimum ISR count required to ack acks=all writes. Set to 2 for RF=3.
If ISR drops below min.insync, producer with acks=all gets NotEnoughReplicas error → block.
Broker failure → followers eventually catch up; if a partition’s leader dies, controller picks new leader from ISR.

Compaction

Alternative to time/size-based deletion.
Keeps latest record per key forever (or until tombstone).
Tombstone: record with null value → instructs cleanup to remove key.
Use for: changelog streams, configuration snapshots, KTable state.
Set cleanup.policy=compact (or compact,delete for hybrid).

KRaft vs ZooKeeper

KRaft (Kafka Raft) replaces ZooKeeper — embedded controller quorum.
Faster failover, simpler ops, single tech to run.
Default in 3.5+, ZK removed in 4.0 (2025).

Offsets & lag

Lag = endOffset - committedOffset per partition for a group.
Monitor with kafka-consumer-groups --describe, Burrow, JMX, Prometheus.
Lag growth = consumers can’t keep up. Add consumers (up to partition count) or scale processing.

Common interview questions

How does Kafka guarantee message ordering? Within a partition; choose the right key.
What happens if a consumer crashes mid-processing? Offset wasn’t committed → re-read on restart → at-least-once → consumer must be idempotent.
How would you design exactly-once for Kafka → Postgres? Idempotent consumer (insert with ON CONFLICT, dedupe by message id), or transactional outbox keyed by message id.
Why are too many consumers (more than partitions) a problem? Idle consumers — partition is the unit of distribution.
What’s a poison pill, and how to handle? Bad message that makes consumer fail repeatedly. Solution: try N times → DLQ topic → alert / replay later.
acks=all vs acks=1 trade-offs? Latency vs durability.
How do you handle a backlog after outage? Scale consumers to partition count, increase fetch.max.bytes, batch processing, parallelize within consumer if possible.
What’s compaction for? Changelog topics, materialized state recovery.
Why might Kafka lag be high but CPU low? I/O bound (DB write per message), or single-threaded consumer code blocking.
What is the controller, and what happens if it dies? With KRaft, quorum elects new controller. Brief metadata unavailability.

Common pitfalls

Too many partitions — slow rebalances, bloated metadata.
Wrong key — bad parallelism (one hot key) or lost ordering (no key).
Not committing offsets correctly — duplicates or losses.
Big messages (>10MB) — bloat replication, memory pressure. Use object storage + key reference.
Using Kafka as a queue — works, but no message-level ack/redrive built-in (until KIP-1191 share groups, ~Kafka 4.4). RabbitMQ/SQS may fit better.
Schema chaos — producers and consumers drift apart. Use schema registry.
No DLQ — poison pill blocks topic forever.
Long processing per record — exceeds max.poll.interval.ms → rebalance loop.

Streams / processing topology basics

KStream (record stream) vs KTable (changelog → keyed table).
Stateful ops (join, aggregate) backed by RocksDB local state + changelog topic for fault tolerance.
Exactly-once processing achievable end-to-end within Streams.
Alternative: Apache Flink for richer semantics, lower latency.

Deep dive — Kafka fundamentals

Kafka is a distributed, replicated, append-only commit log. A broker is a single Kafka server; multiple brokers form a cluster. A topic is a named stream of records, but the physical unit is the partition — an ordered, immutable sequence of records. Each partition has one leader broker (handles all reads/writes) and N-1 followers that asynchronously replicate. The set of replicas currently caught up to the leader is the In-Sync Replica (ISR) set; only ISRs are eligible to be elected leader on failure.

Producers write records (key, value, headers, timestamp). Consumers read records by tracking a numeric offset per partition.

A consumer group (group.id) is the unit of horizontal scale: each partition in a subscribed topic is assigned to exactly one consumer in the group at a time. Offsets are committed per (group, topic, partition) and stored in the internal compacted topic __consumer_offsets. The log-based, append-only nature (vs. queue semantics) is what enables replay, multiple independent consumer groups, and stream processing — readers don’t destroy messages.

Gotchas

A topic is logical; a partition is physical. All scaling and ordering is per-partition.
More consumers than partitions = idle consumers (one consumer per partition max within a group).
__consumer_offsets is itself a compacted Kafka topic — losing it means losing all consumer positions.
Brokers don’t track which messages a consumer “got” — the consumer owns its offset. This is why replay is trivial.
Leader handles all I/O for a partition; “hot” partition keys can hotspot a single broker.

Q: Why is Kafka called a “log” and not a “queue”?

Records aren’t deleted on consumption; they persist by retention policy. Multiple consumer groups can read the same partition independently with their own offsets, and any group can rewind. Kafka is a replayable source-of-truth substrate, not a transient buffer like SQS or RabbitMQ.

Q: What’s in the ISR and why does it matter?

The ISR is the set of replicas (including the leader) within replica.lag.time.max.ms of the leader. With acks=all, a producer is acknowledged only when all ISRs persist the record. With min.insync.replicas=2 and RF=3, you can lose one broker without losing writes; if ISR drops below 2, producers get NotEnoughReplicasException rather than silently degrading durability.

Deep dive — partitions and partition keys

Ordering in Kafka is guaranteed only within a single partition, never across a topic. The default partitioner:

If key != null, partition = murmur2(key) % numPartitions.
If key == null, modern producers use the sticky partitioner (KIP-480) — batch fills one partition, then rotates — instead of pure round-robin, which improves batching efficiency.

Choosing a partition key is the single most consequential design decision: it controls both load distribution and the granularity of ordering. Common pattern: user_id or aggregate_id so all events for one entity stay ordered on one partition while different entities parallelise across the cluster.

Repartitioning is painful because changing partition count changes the hash mapping — the same key suddenly routes to a different partition, breaking ordering for in-flight aggregates and stateful consumers. Mitigation: over-provision partitions up front (target throughput / per-partition throughput, with headroom for 2–3× growth) — you can add consumers later but not safely re-key history.

Gotchas

Hot keys (e.g. tenant_id for one whale tenant) cause one partition + one broker + one consumer to bottleneck while the rest idle.
Adding partitions to an existing topic redistributes the future hash space — existing keys may move. Stateful apps must rebuild state.
Null key → sticky partitioner means ordering is effectively random; never rely on cross-partition order.
The default Java partitioner uses murmur2, KafkaJS defaults to a different hash; mixing producers in different languages on the same topic can cause keys to land on different partitions. Use partitioner: Partitioners.JavaCompatiblePartitioner.
Compaction requires non-null keys; null-keyed messages on a compacted topic are dropped.

Q: How would you partition an orders topic for an e-commerce site?

Key by customer_id (not order_id). Per-customer ordering is the meaningful invariant: status transitions (created → paid → shipped) for the same order must be strictly ordered, and same-customer orders typically share state (loyalty totals, fraud scoring). order_id would maximise parallelism but lose any per-customer aggregation guarantees and could let shipped overtake paid if processed on different partitions.

Q: Topic has 6 partitions, you have 10 consumers in one group. What happens?

6 consumers each get one partition; 4 sit idle as warm standby. They become useful only on rebalance (a consumer dies → one of the idle ones picks up its partition). To actually use 10 consumers you need ≥10 partitions. This is why partition count is a capacity-planning decision, not a tuning knob.

Deep dive — consumer groups and rebalancing

A consumer group’s partition assignment is coordinated by a group coordinator broker. A rebalance is triggered by: a member joining, a member leaving (graceful or via session.timeout.ms), partition count changing, or a topic subscription change.

Legacy “eager” (stop-the-world) rebalancing required all consumers to revoke all partitions, then re-receive an assignment — causing a global processing pause.

KIP-429 Cooperative Incremental Rebalancing (default since Kafka 3.0 via CooperativeStickyAssignor) only revokes partitions that actually need to move, and uses two consecutive rebalances so non-affected consumers keep processing throughout.

Static membership (KIP-345) via group.instance.id further reduces churn: a consumer that disconnects briefly (rolling deploy, k8s pod restart) keeps its assignment if it returns within session.timeout.ms, avoiding a rebalance entirely.

Gotchas

A handler that takes longer than max.poll.interval.ms (default 5 min) gets the consumer kicked out → rebalance → likely re-delivers the same message → loop.
Auto-commit (autoCommit: true with interval) commits up to the last poll — messages acknowledged-but-not-yet-processed can be lost on crash. Prefer manual commit after processing for at-least-once.
Mixing eager and cooperative assignors in one group requires a two-rolling-bounce upgrade — first deploy with both protocols, then remove eager.
group.instance.id must be unique per pod; reusing one across two live pods causes one to be fenced and never join.

Q: A k8s rolling deploy of 12 consumer pods causes 12 rebalances and processing latency spikes. How do you fix it?

Two layers: (1) enable static membership with group.instance.id per pod and tune session.timeout.ms higher than pod restart time (~45–60s) so brief disconnects don’t trigger rebalance; (2) ensure CooperativeStickyAssignor so the rebalances that do happen only revoke partitions that need to move. Combined, a rolling deploy goes from N global pauses to near-zero observable disruption.

Q: What’s the difference between session.timeout.ms and max.poll.interval.ms?

session.timeout.ms is the heartbeat liveness check (background thread); max.poll.interval.ms bounds how long the application thread can take between poll() calls. A consumer can be heartbeating fine but stuck in a slow handler — max.poll.interval.ms exists to evict zombies that can’t keep up. This split was introduced in KIP-62.

Deep dive — delivery semantics and exactly-once

Kafka offers three semantics:

At-most-once — commit offset before processing — crash loses message.
At-least-once — commit after processing — crash redelivers (the default safe mode).
Exactly-once (EOS) — built from two primitives.

Primitives of EOS

Idempotent producer (enable.idempotence=true, default since 3.0):

Each producer gets a Producer ID (PID).
Each batch has a per-partition monotonic sequence number.
The broker dedupes resends within a producer session, persisting the sequence in the replicated log so leader failover preserves dedup.

Transactions:

transactional.id provides cross-session continuity and zombie fencing (each initTransactions bumps an epoch; older instances with the same ID are rejected).
The producer atomically writes to multiple partitions via beginTransaction/commitTransaction/abortTransaction.
Crucially, sendOffsets writes the consumer offset commit into the same transaction, making the consume-process-produce loop atomic.

EOS is end-to-end inside Kafka only — it works because both data and offsets live in Kafka. The moment you write to an external system (Postgres, Stripe, S3), you’re back to at-least-once and the consumer must be idempotent (dedupe by event ID or use upserts). Consumers see transactional output only when reading with isolation.level=read_committed.

Gotchas

EOS does not extend to external sinks. A “transactional” Kafka producer + naive INSERT INTO postgres = at-least-once writes to Postgres. Use upserts, idempotency keys, or the outbox pattern.
transactional.id must be stable across restarts and unique per logical producer instance. Dynamic IDs (e.g. UUID per boot) break zombie fencing.
Consumers without isolation.level=read_committed see aborted transactional records and uncommitted in-flight ones.
EOS adds latency: transactional commits write markers to all involved partitions and the transaction log. Confluent benchmarks show ~3% throughput overhead at 100ms commit intervals, but tail latency rises with longer transactions.
enable.idempotence=true requires acks=all, retries > 0, max.in.flight.requests.per.connection <= 5. Conflicting configs throw at startup.

Q: Walk through how Kafka achieves exactly-once for a stream-processing job.

Three coordinated mechanisms: (1) Idempotent producer dedupes retries within a session via PID + sequence number. (2) Transactions atomically group writes across multiple partitions plus the offset commit into __consumer_offsets; a crash before commitTransaction aborts everything and the consumer reprocesses from the last committed offset. (3) Zombie fencing via transactional.id epoch ensures a restarted/duplicated instance can’t write — only the latest epoch holder is allowed. Consumers downstream must use read_committed to see only committed output. End result: each input record’s effects (output records + offset advance) are reflected exactly once in Kafka.

Q: My consumer writes to Postgres. Can I get exactly-once?

Not via Kafka transactions alone — they don’t span Postgres. Three viable approaches: (a) Idempotent consumer — include a unique event_id header, store processed IDs in a dedupe table, skip on conflict. (b) Upsert by natural key — make the operation itself idempotent (INSERT … ON CONFLICT DO UPDATE). (c) Transactional outbox — write the Postgres change and an outbox row in one DB transaction, then have Debezium tail the WAL and publish to Kafka. Option (c) is the cleanest answer for a strong audit story.

Deep dive — idempotency, DLQ, retry topics

At-least-once + idempotent consumer is the practical baseline for production. The idempotent consumer pattern stores (event_id, processed_at) in a dedupe table (or Redis with TTL) and skips on duplicate before the side effect.

Dead Letter Queues capture messages the consumer cannot process — bad schema, business-rule violation, persistent downstream failure — into a separate <topic>.DLQ topic with metadata headers (original topic/partition/offset, exception class, stack trace, attempt count) so they can be inspected, fixed, and optionally replayed.

Retry topics (Uber’s reliable-reprocessing pattern) solve the poison pill problem: a single bad message blocking head-of-line. Instead of retrying inline (which blocks the partition), the consumer publishes failures to a chain of retry topics with increasing delay (retry-5s → retry-30s → retry-5m → retry-1h → DLQ), each consumed by a worker that waits until now - message.timestamp >= delay before processing. This is non-blocking retry — the main consumer always advances.

Gotchas

DLQ without alerting is a black hole. Always alert on DLQ depth/rate.
Inline retry-with-sleep on the main consumer blocks the partition and risks max.poll.interval.ms eviction → rebalance storm. Always use a retry topic for delays > a few seconds.
Retry topics break per-key ordering. If ordering matters more than progress, use Confluent’s “maintain order” variant: park subsequent same-key events in an in-memory tombstone set until retry completes.
Dedupe table grows unbounded. Use TTL ≥ retention period × safety margin, or a Bloom filter for high-volume streams.
A “transient” failure that’s actually permanent (malformed JSON) cycles uselessly through every retry tier. Classify errors: route schema/parse errors directly to DLQ; only network/timeout errors enter retries.

Q: Walk me through your error-handling strategy for a Kafka consumer.

Three tiers. (1) Validation/parse errors → straight to DLQ (no point retrying bad data). (2) Transient errors (network, 5xx, timeout) → non-blocking retry topic chain with exponential delays (5s/30s/5m/1h), preserving the original message + error metadata in headers. (3) Persistent failures after final retry → DLQ with full provenance for human triage. All consumers are idempotent (event-ID dedupe) so retries are safe. DLQ depth is monitored with alerts; a runbook covers replay back to main topic after fix.

Q: What’s the poison-pill problem and how do you solve it?

A “poison pill” is a single message that always fails processing (bad schema, unhandled value), blocking head-of-line in its partition: the consumer retries forever, never commits offset, partition lag grows without bound, eventually max.poll.interval.ms evicts the consumer triggering a rebalance. Solution: bound retry attempts in-process, then publish to a DLQ or retry topic and commit the offset to advance. The poison message is preserved for inspection without holding up the entire partition.

Deep dive — event sourcing vs CDC vs outbox

These are constantly conflated; they are not the same thing.

Event sourcing is a domain modelling decision: the source of truth is the append-only sequence of business events (OrderPlaced, PaymentCaptured, OrderShipped); current state is derived by replaying events, optionally with periodic snapshots. It pairs naturally with CQRS because the event log is bad at queries — projections build read models. Benefits: 100% audit trail (critical for regulated domains), temporal queries (“state at time T”), natural fit with Kafka log compaction. Costs: steep learning curve, schema evolution of historical events is hard, eventual consistency on the read side.

CDC (Change Data Capture) is an integration technique: an existing system writes to its own DB (Postgres, MySQL, Mongo); a CDC tool reads the database’s transaction log (Postgres logical replication via pgoutput, MySQL binlog, Mongo oplog) and publishes row-level change events to Kafka. Debezium runs as a Kafka Connect source connector; on first run it takes a consistent snapshot, then streams changes from the log position. Each event includes op (c/u/d/r), before, after, and source metadata (LSN, transaction ID). CDC solves the dual-write problem — instead of “write to DB then publish to Kafka” (which can fail between the two), you write only to the DB and let CDC publish atomically from the WAL.

The transactional outbox sits between them: in the same DB transaction as your business write, insert a row into an outbox table; Debezium tails the outbox table and publishes to Kafka. Atomicity of business state and event publication, with Kafka as the integration bus, without forcing full event sourcing.

Gotchas

CDC events are row-level, not domain events — UPDATE users SET email=... becomes a row diff, not EmailChanged. Either translate downstream, or use the outbox to publish proper domain events.
Postgres logical replication slots retain WAL until consumed; an offline Debezium connector causes WAL to grow unboundedly and can fill disk. Monitor slot lag.
Schema evolution differs: in event sourcing you can never change historical events (they’re immutable facts); in CDC you re-snapshot to get the new schema for existing rows.
Event sourcing without snapshotting becomes pathologically slow on long-lived aggregates.
Pure event sourcing makes ad-hoc queries hard — CQRS is mandatory, not optional.

Q: When would you choose event sourcing vs CDC vs the outbox pattern?

Event sourcing when the business genuinely thinks in events, audit/temporal queries are first-class requirements, and you can invest in CQRS read-model rebuilding. Strong fit for regulated, financial, compliance-heavy domains. CDC when you have an existing system you can’t rewrite and need to fan its data out to Kafka without modifying app code — classic for legacy integration, replicating to data lakes, building search indexes. Transactional outbox when you have a service-oriented system, want reliable event publication without dual-write bugs, but don’t need the full event-sourcing commitment — pragmatic 80/20 sweet spot. Outbox + Debezium is the most common production pattern.

Q: What’s the dual-write problem and how does CDC solve it?

A service updates its DB then publishes to Kafka in two separate operations. If it crashes between them, DB and Kafka diverge — no distributed transaction, no atomicity. CDC eliminates the dual write: the service writes only to its DB; Debezium reads the WAL — already the durable, ordered, atomic source of truth — and publishes to Kafka. Either the DB transaction commits (and the change appears in WAL → Kafka) or it doesn’t.

Deep dive — Schema Registry compatibility

Schema Registry (Confluent or Apicurio) stores versioned schemas (Avro, Protobuf, JSON Schema) keyed by <topic>-value / <topic>-key subject. In Confluent’s wire format, producers serialise records as [magic byte][4-byte schema id][payload]; consumers fetch the schema by ID and deserialise. Decouples producers from consumers in time — you can evolve schemas without lockstep deploys, if the change respects the configured compatibility mode.

Compatibility modes

BACKWARD (default) — readers using the new schema can read data written with the previous schema; useful for consumer-first upgrades and replaying retained old data.
FORWARD — readers using the old schema can read data written with the new schema; useful when producers may deploy before consumers.
FULL — both directions, so either side can roll first for the allowed change set.
NONE — no checks (avoid).
*_TRANSITIVE variants check against all prior versions, not just the immediate predecessor — important for long-retained compacted topics where a brand-new consumer may encounter very old records.

Gotchas

Adding a required field (no default) breaks BACKWARD.
Renaming a field is a breaking change in Avro (use aliases to bridge).
Changing a field’s type is almost always breaking — even widening (int → long) is only forward-compatible, not backward.
Default BACKWARD (not BACKWARD_TRANSITIVE) only checks against the latest version. With long retention you may have a v3 consumer trying to read v1 data — set BACKWARD_TRANSITIVE.
JSON Schema is more permissive than Avro/Protobuf and easier to make accidentally incompatible — prefer Avro or Protobuf for strict contracts.

Q: How do you safely add a field to a Kafka event consumed by 5 services?

Add the field as optional with a default value so new consumers can read old records and old consumers can ignore new records. For shared topics, prefer FULL_TRANSITIVE if you cannot control rollout order; BACKWARD alone protects new consumers reading old data, but it does not prove old consumers can read new data. If the field must eventually be required, do a phased migration: (1) add as optional with default, (2) update all consumers, (3) backfill or wait until old data ages out, (4) make it required only in a major-version schema. Never add a required field without a default in a single deploy.

Q: Why use BACKWARD instead of FORWARD as default?

BACKWARD asks: “can my new consumer read data written by old producers?” That matters because Kafka keeps old records around for replay, backfills, and new consumer groups. It supports consumer-first upgrades and protects new code from retained history. FORWARD is the producer-first safety check: “can old consumers read data written by the new producer?” If rollout order is uncertain or the topic has many independent consumers, use FULL_TRANSITIVE rather than relying on only one direction.

Deep dive — operational config (durability)

Production durability hinges on a small set of coupled settings:

Replication factor (RF) = 3 — standard. Survives one broker loss without ISR shrinking below the durability threshold.
min.insync.replicas=2 with RF=3 — writes require 2 of 3 replicas to ack. Survives one broker down without rejecting writes; if two are down, producers get NotEnoughReplicasException (fail-fast over silent data loss).
acks=all on the producer — waits for all ISRs.
unclean.leader.election.enable=false — prevents an out-of-sync replica from being elected leader. The alternative is preferring availability over consistency and silently losing acknowledged writes.

Retention is per-topic via retention.ms (time, default 7 days) or retention.bytes (size).

Compaction (cleanup.policy=compact) is not time-based — it retains the latest value per key indefinitely and uses null-payload tombstones to mark deletes (purged after delete.retention.ms, default 24h). Compaction is the foundation for state topics, __consumer_offsets, Kafka Streams changelogs, and any topic representing “current state.” cleanup.policy=compact,delete combines both.

Gotchas

acks=all without min.insync.replicas >= 2 is a lie — if ISR shrinks to 1, “all” means “one,” durability is gone.
unclean.leader.election.enable=true trades durability for availability — acceptable for some use cases (metrics) but never for financial/regulatory data.
Compaction needs non-null keys; null-keyed messages on a compacted topic are silently dropped.
Tombstones must be retained long enough for all consumers to see them (delete.retention.ms) — too short and a stale consumer misses the delete and keeps obsolete state.
segment.bytes / segment.ms control when log segments roll; compaction only operates on closed segments, so very large segments delay compaction visibility.

Q: How would you configure a Kafka topic for a regulated financial event store?

RF=3 across three availability zones; min.insync.replicas=2; producer acks=all and enable.idempotence=true; unclean.leader.election.enable=false (consistency over availability — regulators will not accept silent data loss); long retention or infinite (retention.ms=-1) since events are the source of truth; cleanup.policy=delete (not compact — you need full history, not latest-per-key); enable encryption-at-rest and TLS in-transit; produce via transactional producer for atomic multi-partition writes; consume with isolation.level=read_committed.

Q: When do you use compaction vs time-based retention?

Compaction for state topics: latest value per key matters, history doesn’t (user profile cache, current inventory level, __consumer_offsets). Time-based for event topics: the sequence is the truth, you want N days of history then drop (audit log for 90 days, raw clickstream for analytics). Use both (compact,delete) for state with bounded staleness.

Deep dive — alternatives (when not to pick Kafka)

Broker	Sweet spot	Trade-off
RabbitMQ	Task queues, RPC, complex routing rules, traditional work distribution with low-to-medium throughput. Smart broker / dumb consumer. Rich AMQP routing (direct, fanout, topic, headers exchanges), per-message ack, push-based delivery, low-latency.	Doesn’t natively replay or store events long-term.
AWS SNS+SQS	AWS-native serverless (Lambda triggers), simple decoupling, no ops burden.	No ordering across queue (FIFO queues exist with throughput limits), no replay.
NATS JetStream	Edge / IoT / low-latency microservices where Kafka is overkill. Lightweight, microsecond latency, simpler ops, supports streams + KV + object store.	Smaller ecosystem than Kafka.
Redis Streams	Rate-limited work queues, real-time feeds within a single application. Fast, simple, in-process.	Bounded by memory; not a system-of-record.

Decision heuristic

Need replay + long retention + analytics + stream processing + high throughput + audit log? Kafka. Need complex routing, RPC, low-throughput task distribution? RabbitMQ. Already in AWS, want zero ops, simple decoupling? SNS/SQS. Need ultra-low latency or edge deployment with simpler ops? NATS. Caching layer or in-app queue? Redis Streams.

Q: Why pick Kafka over RabbitMQ for an event-driven system?

Three differentiators: (1) Replay — Kafka retains messages by policy, not consumption, so you can add a new consumer and replay a year of history to build a new read model; RabbitMQ deletes on ack. (2) Per-partition ordering with horizontal scale — Kafka partitions give you parallelism and ordering simultaneously; RabbitMQ queues serialize a single consumer or scale by losing order. (3) High throughput at low cost — Kafka’s append-only sequential I/O and zero-copy delivery handle millions of msgs/sec on modest hardware; RabbitMQ’s per-message ack and routing logic cap throughput an order of magnitude lower.

Q: Would you use Kafka for a request/response pattern?

Generally no. Kafka is optimised for asynchronous, high-throughput streaming; request/response over Kafka requires correlation IDs, response topics, and adds tail latency from batching/poll intervals. Use gRPC, HTTP, or RabbitMQ RPC for synchronous request/response.

Sources

Apache Kafka Documentation
Confluent: Exactly-Once Semantics Are Possible
Confluent: Transactions in Apache Kafka
KIP-429: Kafka Consumer Incremental Rebalance Protocol
Confluent: Avro Schema Evolution
Confluent: Kafka Streams Concepts
Confluent: Kafka Log Compaction
Confluent: Error Handling Patterns in Kafka
Uber Engineering: Reliable Reprocessing and DLQs
microservices.io: Event Sourcing pattern
microservices.io: Transactional Outbox pattern
Debezium Architecture
KafkaJS: Transactions
Kafka: The Definitive Guide, 2nd ed. — Shapira, Palino, Sivaram, Petty (O’Reilly)
Designing Data-Intensive Applications — Martin Kleppmann (O’Reilly), Ch. 11