Contents

The naive approach publishes to Kafka inside (or just after) the database transaction:

// ❌ UNSAFE — not atomic: crash between save and send leaves them inconsistent @Transactional public void createOrder(CreateOrderRequest req) { Order order = orderRepo.save(new Order(req)); kafka.send("order.created", new OrderCreatedEvent(order.getId())); // could fail! }

If kafka.send() fails (broker down, timeout) the event is lost. If it succeeds but the transaction rolls back afterward, a ghost event is published. Neither is acceptable in a distributed system.

Execute the SQL statements below against your target database. Review each statement before running in production.

CREATE TABLE outbox_events ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), aggregate_type VARCHAR(100) NOT NULL, -- e.g. 'Order' aggregate_id VARCHAR(100) NOT NULL, -- e.g. the order ID event_type VARCHAR(100) NOT NULL, -- e.g. 'OrderCreatedEvent' payload JSONB NOT NULL, -- serialized event body topic VARCHAR(200) NOT NULL, -- target Kafka topic created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), sent_at TIMESTAMPTZ, -- NULL = not yet relayed status VARCHAR(20) NOT NULL DEFAULT 'PENDING' -- PENDING | SENT | FAILED ); CREATE INDEX idx_outbox_pending ON outbox_events(status, created_at) WHERE status = 'PENDING';

Write the outbox row in the same transaction as the business entity. If the transaction rolls back, the outbox row is also rolled back — guaranteed consistency.

import com.fasterxml.jackson.databind.ObjectMapper; import org.springframework.stereotype.Service; import org.springframework.transaction.annotation.Transactional; @Service public class OrderService { private final OrderRepository orderRepo; private final OutboxRepository outboxRepo; private final ObjectMapper mapper; @Transactional // both writes share one ACID transaction public Order createOrder(CreateOrderRequest req) { Order order = orderRepo.save(new Order(req)); OutboxEvent event = OutboxEvent.builder() .aggregateType("Order") .aggregateId(String.valueOf(order.getId())) .eventType("OrderCreatedEvent") .topic("order.created") .payload(mapper.writeValueAsString( new OrderCreatedEvent(order.getId(), order.getCustomerId(), order.getProductId(), order.getQuantity()))) .status("PENDING") .build(); outboxRepo.save(event); // same transaction — atomic! return order; } }

A scheduled job polls the outbox table for PENDING rows, publishes them to Kafka, and marks them SENT. Simple to implement, works with any database.

import org.springframework.kafka.core.KafkaTemplate; import org.springframework.scheduling.annotation.Scheduled; import org.springframework.stereotype.Component; import org.springframework.transaction.annotation.Transactional; @Component public class OutboxPoller { private final OutboxRepository outboxRepo; private final KafkaTemplate<String, String> kafka; @Scheduled(fixedDelay = 500) // poll every 500 ms @Transactional public void pollAndPublish() { List<OutboxEvent> pending = outboxRepo.findTop50ByStatusOrderByCreatedAtAsc("PENDING"); for (OutboxEvent event : pending) { try { kafka.send(event.getTopic(), event.getAggregateId(), event.getPayload()) .get(5, TimeUnit.SECONDS); // wait for ack event.setStatus("SENT"); event.setSentAt(Instant.now()); } catch (Exception ex) { event.setStatus("FAILED"); log.error("Failed to publish outbox event id={}", event.getId(), ex); } outboxRepo.save(event); } } } Use SELECT ... FOR UPDATE SKIP LOCKED in the outbox query when running multiple instances to prevent duplicate publishing across pods. Spring Data JPA: add @Lock(PESSIMISTIC_WRITE) with a @QueryHint for jakarta.persistence.lock.timeout=SKIP_LOCKED.

Debezium connects to the database's binary log (PostgreSQL WAL) and streams every INSERT into the outbox table directly to Kafka — zero polling, sub-second latency, no lock contention.

# docker-compose.yml — Debezium connector for PostgreSQL outbox version: "3.9" services: debezium: image: debezium/connect:2.7 environment: BOOTSTRAP_SERVERS: kafka:9092 GROUP_ID: debezium-outbox CONFIG_STORAGE_TOPIC: debezium.configs OFFSET_STORAGE_TOPIC: debezium.offsets STATUS_STORAGE_TOPIC: debezium.status ports: - "8083:8083" // POST http://debezium:8083/connectors — register connector { "name": "outbox-connector", "config": { "connector.class": "io.debezium.connector.postgresql.PostgresConnector", "database.hostname": "postgres", "database.port": "5432", "database.user": "app", "database.password": "secret", "database.dbname": "appdb", "table.include.list": "public.outbox_events", "transforms": "outbox", "transforms.outbox.type": "io.debezium.transforms.outbox.EventRouter", "transforms.outbox.route.by.field": "topic", "transforms.outbox.table.field.event.id": "id", "transforms.outbox.table.field.event.key": "aggregate_id", "transforms.outbox.table.field.event.payload": "payload", "transforms.outbox.table.field.event.type": "event_type" } }

Debezium's EventRouter SMT (Single Message Transform) reads the outbox row and routes it to the Kafka topic stored in the topic column. No code change needed in the Spring application — just insert to the outbox table.

-- Debezium reads the WAL and sees this INSERT — the EventRouter SMT -- extracts topic='order.created', key='42', payload='{...}' -- and produces a Kafka record on topic 'order.created' with key '42' INSERT INTO outbox_events (aggregate_type, aggregate_id, event_type, topic, payload) VALUES ('Order', '42', 'OrderCreatedEvent', 'order.created', '{"orderId":42,...}');

To prevent unbounded table growth, delete sent rows periodically:

-- Run via pg_cron or a Spring @Scheduled job DELETE FROM outbox_events WHERE created_at < NOW() - INTERVAL '7 days';
AspectPolling PublisherDebezium CDC
LatencyUp to polling interval (500ms–5s)Sub-second (WAL streaming)
ThroughputLimited by poll batch sizeVery high — streams WAL
DB loadRegular SELECT + UPDATE queriesReplication slot — minimal
Setup complexityLow — just a schedulerMedium — Kafka Connect + connector
Ops complexityLowMedium — manage connector lifecycle
Database supportAnyPostgreSQL, MySQL, MongoDB, SQL Server
Exactly-once to KafkaAt-least-once (needs idempotent consumer)At-least-once (same)

Start with the polling publisher for simplicity. Switch to Debezium CDC when latency or DB query load becomes a concern.