Contents
Topics Partitions Replication — Leaders and Followers Log Retention Log Compaction Managing Topics via AdminClient API
A topic is a named, append-only log. Producers write to it; consumers read from it. Topics are identified by name and can have multiple partitions.
- Topic names are case-sensitive and conventionally use lowercase and hyphens (e.g.
user-events ). - Messages are not deleted on consumption — they are retained for a configurable period.
- Multiple consumer groups can read the same topic independently.
A partition is an ordered, immutable sequence of records, each identified by a monotonically increasing offset. Each partition lives on a single broker at any time.
- Partitions are the unit of parallelism — more partitions = more concurrent consumers.
- Within a partition, ordering is guaranteed. Across partitions, it is not.
- Messages with the same key always land in the same partition (hash-based routing).
- Partition count can be increased but never decreased. Choose carefully upfront.
Each partition has one leader and zero or more followers (determined by the replication factor).
- All reads and writes go to the leader.
- Followers replicate data from the leader asynchronously.
- In-Sync Replicas (ISR) — the set of replicas that are caught up to the leader. When
acks=all , all ISR brokers must acknowledge. - If the leader fails, one of the ISR followers is elected the new leader automatically.
| Replication Factor | Fault Tolerance | Recommended for |
|---|---|---|
| 1 | None (data lost if broker dies) | Dev/test only |
| 2 | Tolerates 1 broker failure | Non-critical data |
| 3 | Tolerates 2 broker failures | Production (standard) |
Kafka retains messages for a configurable period regardless of whether they were consumed. Two retention modes:
- Time-based (
retention.ms ) — delete messages older than the configured time (default: 7 days). - Size-based (
retention.bytes ) — delete oldest segments when partition size exceeds limit.
Log compaction is an alternative to time-based deletion. Instead of deleting old data by time, Kafka keeps only the latest record for each key. Older records with the same key are removed. This is ideal for maintaining a changelog or snapshot of the latest state per entity.