Contents
- The Dual-Write Problem
- Outbox Table Schema
- Writing to the Outbox Atomically
- Approach 1 — Polling Publisher
- Approach 2 — Debezium CDC
- Debezium Outbox Event Router
- Polling vs CDC Comparison
The naive approach publishes to Kafka inside (or just after) the database transaction:
// ❌ UNSAFE — not atomic: crash between save and send leaves them inconsistent
@Transactional
public void createOrder(CreateOrderRequest req) {
Order order = orderRepo.save(new Order(req));
kafka.send("order.created", new OrderCreatedEvent(order.getId())); // could fail!
}
If kafka.send() fails (broker down, timeout) the event is lost. If it succeeds but the transaction rolls back afterward, a ghost event is published. Neither is acceptable in a distributed system.
Execute the SQL statements below against your target database. Review each statement before running in production.
CREATE TABLE outbox_events (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
aggregate_type VARCHAR(100) NOT NULL, -- e.g. 'Order'
aggregate_id VARCHAR(100) NOT NULL, -- e.g. the order ID
event_type VARCHAR(100) NOT NULL, -- e.g. 'OrderCreatedEvent'
payload JSONB NOT NULL, -- serialized event body
topic VARCHAR(200) NOT NULL, -- target Kafka topic
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
sent_at TIMESTAMPTZ, -- NULL = not yet relayed
status VARCHAR(20) NOT NULL DEFAULT 'PENDING' -- PENDING | SENT | FAILED
);
CREATE INDEX idx_outbox_pending ON outbox_events(status, created_at)
WHERE status = 'PENDING';
Write the outbox row in the same transaction as the business entity. If the transaction rolls back, the outbox row is also rolled back — guaranteed consistency.
import com.fasterxml.jackson.databind.ObjectMapper;
import org.springframework.stereotype.Service;
import org.springframework.transaction.annotation.Transactional;
@Service
public class OrderService {
private final OrderRepository orderRepo;
private final OutboxRepository outboxRepo;
private final ObjectMapper mapper;
@Transactional // both writes share one ACID transaction
public Order createOrder(CreateOrderRequest req) {
Order order = orderRepo.save(new Order(req));
OutboxEvent event = OutboxEvent.builder()
.aggregateType("Order")
.aggregateId(String.valueOf(order.getId()))
.eventType("OrderCreatedEvent")
.topic("order.created")
.payload(mapper.writeValueAsString(
new OrderCreatedEvent(order.getId(), order.getCustomerId(),
order.getProductId(), order.getQuantity())))
.status("PENDING")
.build();
outboxRepo.save(event); // same transaction — atomic!
return order;
}
}
A scheduled job polls the outbox table for PENDING rows, publishes them to Kafka, and marks them SENT. Simple to implement, works with any database.
import org.springframework.kafka.core.KafkaTemplate;
import org.springframework.scheduling.annotation.Scheduled;
import org.springframework.stereotype.Component;
import org.springframework.transaction.annotation.Transactional;
@Component
public class OutboxPoller {
private final OutboxRepository outboxRepo;
private final KafkaTemplate<String, String> kafka;
@Scheduled(fixedDelay = 500) // poll every 500 ms
@Transactional
public void pollAndPublish() {
List<OutboxEvent> pending = outboxRepo.findTop50ByStatusOrderByCreatedAtAsc("PENDING");
for (OutboxEvent event : pending) {
try {
kafka.send(event.getTopic(), event.getAggregateId(), event.getPayload())
.get(5, TimeUnit.SECONDS); // wait for ack
event.setStatus("SENT");
event.setSentAt(Instant.now());
} catch (Exception ex) {
event.setStatus("FAILED");
log.error("Failed to publish outbox event id={}", event.getId(), ex);
}
outboxRepo.save(event);
}
}
}
Use SELECT ... FOR UPDATE SKIP LOCKED in the outbox query when running multiple instances to prevent duplicate publishing across pods. Spring Data JPA: add @Lock(PESSIMISTIC_WRITE) with a @QueryHint for jakarta.persistence.lock.timeout=SKIP_LOCKED.
Debezium connects to the database's binary log (PostgreSQL WAL) and streams every INSERT into the outbox table directly to Kafka — zero polling, sub-second latency, no lock contention.
# docker-compose.yml — Debezium connector for PostgreSQL outbox
version: "3.9"
services:
debezium:
image: debezium/connect:2.7
environment:
BOOTSTRAP_SERVERS: kafka:9092
GROUP_ID: debezium-outbox
CONFIG_STORAGE_TOPIC: debezium.configs
OFFSET_STORAGE_TOPIC: debezium.offsets
STATUS_STORAGE_TOPIC: debezium.status
ports:
- "8083:8083"
// POST http://debezium:8083/connectors — register connector
{
"name": "outbox-connector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "postgres",
"database.port": "5432",
"database.user": "app",
"database.password": "secret",
"database.dbname": "appdb",
"table.include.list": "public.outbox_events",
"transforms": "outbox",
"transforms.outbox.type": "io.debezium.transforms.outbox.EventRouter",
"transforms.outbox.route.by.field": "topic",
"transforms.outbox.table.field.event.id": "id",
"transforms.outbox.table.field.event.key": "aggregate_id",
"transforms.outbox.table.field.event.payload": "payload",
"transforms.outbox.table.field.event.type": "event_type"
}
}
Debezium's EventRouter SMT (Single Message Transform) reads the outbox row and routes it to the Kafka topic stored in the topic column. No code change needed in the Spring application — just insert to the outbox table.
-- Debezium reads the WAL and sees this INSERT — the EventRouter SMT
-- extracts topic='order.created', key='42', payload='{...}'
-- and produces a Kafka record on topic 'order.created' with key '42'
INSERT INTO outbox_events (aggregate_type, aggregate_id, event_type, topic, payload)
VALUES ('Order', '42', 'OrderCreatedEvent', 'order.created', '{"orderId":42,...}');
To prevent unbounded table growth, delete sent rows periodically:
-- Run via pg_cron or a Spring @Scheduled job
DELETE FROM outbox_events WHERE created_at < NOW() - INTERVAL '7 days';
| Aspect | Polling Publisher | Debezium CDC |
| Latency | Up to polling interval (500ms–5s) | Sub-second (WAL streaming) |
| Throughput | Limited by poll batch size | Very high — streams WAL |
| DB load | Regular SELECT + UPDATE queries | Replication slot — minimal |
| Setup complexity | Low — just a scheduler | Medium — Kafka Connect + connector |
| Ops complexity | Low | Medium — manage connector lifecycle |
| Database support | Any | PostgreSQL, MySQL, MongoDB, SQL Server |
| Exactly-once to Kafka | At-least-once (needs idempotent consumer) | At-least-once (same) |
Start with the polling publisher for simplicity. Switch to Debezium CDC when latency or DB query load becomes a concern.