Kafka - Topics and Partitions

Topics
Partitions
Replication — Leaders and Followers
Log Retention
Log Compaction
Managing Topics via AdminClient API

A topic is a named, append-only log. Producers write to it; consumers read from it. Topics are identified by name and can have multiple partitions.

Topic names are case-sensitive and conventionally use lowercase and hyphens (e.g. user-events).
Messages are not deleted on consumption — they are retained for a configurable period.
Multiple consumer groups can read the same topic independently.

A partition is an ordered, immutable sequence of records, each identified by a monotonically increasing offset. Each partition lives on a single broker at any time.

Partitions are the unit of parallelism — more partitions = more concurrent consumers.
Within a partition, ordering is guaranteed. Across partitions, it is not.
Messages with the same key always land in the same partition (hash-based routing).
Partition count can be increased but never decreased. Choose carefully upfront.

A good starting point: 1–3 partitions per broker per topic. For high-throughput topics, multiply by the expected max consumers in the group.

Each partition has one leader and zero or more followers (determined by the replication factor).

All reads and writes go to the leader.
Followers replicate data from the leader asynchronously.
In-Sync Replicas (ISR) — the set of replicas that are caught up to the leader. When acks=all, all ISR brokers must acknowledge.
If the leader fails, one of the ISR followers is elected the new leader automatically.

Replication Factor	Fault Tolerance	Recommended for
1	None (data lost if broker dies)	Dev/test only
2	Tolerates 1 broker failure	Non-critical data
3	Tolerates 2 broker failures	Production (standard)

Kafka retains messages for a configurable period regardless of whether they were consumed. Two retention modes:

Time-based (retention.ms) — delete messages older than the configured time (default: 7 days).
Size-based (retention.bytes) — delete oldest segments when partition size exceeds limit.

# Create topic with 1-day retention kafka-topics.sh --create \ --topic user-events \ --partitions 6 \ --replication-factor 3 \ --config retention.ms=86400000 \ --bootstrap-server localhost:9092

Log compaction is an alternative to time-based deletion. Instead of deleting old data by time, Kafka keeps only the latest record for each key. Older records with the same key are removed. This is ideal for maintaining a changelog or snapshot of the latest state per entity.

# Enable log compaction on a topic kafka-topics.sh --create \ --topic user-profiles \ --partitions 3 \ --replication-factor 3 \ --config cleanup.policy=compact \ --bootstrap-server localhost:9092

Compaction is asynchronous and does not guarantee immediate removal. Records with null values (tombstones) signal deletion of a key.

import org.apache.kafka.clients.admin.*; import java.util.*; Properties props = new Properties(); props.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092"); try (AdminClient admin = AdminClient.create(props)) { // Create a topic NewTopic newTopic = new NewTopic("orders", 6, (short) 3); newTopic.configs(Map.of("retention.ms", "604800000")); // 7 days admin.createTopics(List.of(newTopic)).all().get(); // List topics Set<String> topics = admin.listTopics().names().get(); System.out.println("Topics: " + topics); // Describe a topic Map<String, TopicDescription> desc = admin.describeTopics(List.of("orders")).all().get(); desc.forEach((name, td) -> System.out.println(name + ": " + td)); // Delete a topic admin.deleteTopics(List.of("old-topic")).all().get(); }

Contents