Contents
What is Kafka? Core Components Topics and Partitions Producers and Consumers Use Cases Kafka vs Traditional Message Queues
Kafka acts as a distributed commit log. Events (messages) are appended to topics and retained for a configurable period — consumers can replay from any offset. This differs from traditional queues where messages are deleted after consumption.
- Publish-Subscribe — many producers publish events; many consumer groups read independently.
- Durable — messages are written to disk and replicated across brokers.
- Scalable — topics are split into partitions distributed across a cluster.
- High-throughput — sequential disk writes + batching achieve millions of events/second.
| Component | Role |
|---|---|
| Broker | A Kafka server that stores and serves messages. A cluster has multiple brokers for fault tolerance. |
| Topic | A named log/category where messages are published. Analogous to a database table. |
| Partition | A topic is split into one or more ordered, immutable partitions. Parallelism unit. |
| Producer | Application that publishes messages to a topic. |
| Consumer | Application that reads messages from a topic. |
| Consumer Group | A group of consumers that together consume a topic — each partition is assigned to exactly one consumer in the group. |
| Offset | A unique sequential ID for each message within a partition. Consumers track their position via offsets. |
| ZooKeeper / KRaft | Coordinates cluster metadata (broker election, topic configs). Kafka 3.x+ supports KRaft (built-in, removes ZooKeeper dependency). |
A topic is a logical channel. A partition is a physical file on a broker. Partitions enable parallelism: multiple consumers in a group can read different partitions simultaneously.
- Messages within a partition are strictly ordered.
- Across partitions, there is no global ordering guarantee.
- A replication factor of N means N copies exist — one leader + (N-1) followers. If the leader fails, a follower takes over.
- Messages are routed to partitions by key hash (same key → same partition) or round-robin.
Producers write messages to topics. Key settings:
acks — acknowledgment level: 0 (fire and forget), 1 (leader ack), all/-1 (leader + replicas).retries — automatic retry on transient failures.batch.size — batch messages together for higher throughput.
Consumers read messages from topics. Key concepts:
- Consumers belong to a consumer group. Each partition is assigned to exactly one consumer per group.
- Adding consumers to a group increases parallelism (up to the number of partitions).
- Consumers commit offsets to track their position. On restart, they resume from the committed offset.
- Commit strategies: auto-commit (simple but risk of reprocessing/skipping), manual sync/async commit (more control).
- Real-time analytics — stream user clicks, transactions, sensor data to analytics pipelines.
- Microservices communication — decouple services via events instead of synchronous REST calls.
- Log aggregation — centralise application and system logs from many services.
- Event sourcing — store every state change as an immutable event log.
- Data integration (ETL) — move data between databases, data warehouses, and search indexes.
- Stream processing — Kafka Streams or Apache Flink for real-time transformations.
| Feature | Kafka | Traditional Queue (RabbitMQ, SQS) |
|---|---|---|
| Message retention | Configurable (hours/days/forever) | Deleted after consumption |
| Replay | ✅ Yes — reset offset to re-read | ❌ No |
| Consumers | Multiple independent groups | Competing consumers (one wins) |
| Ordering | Per partition | Per queue (often) |
| Throughput | Very high (millions/sec) | Lower |
| Best for | Event streaming, analytics, log pipelines | Task queues, job dispatching |