Kubernetes — Liveness, Readiness & Startup Probes + Graceful Shutdown

Probe Types Overview
Spring Boot Actuator Health Endpoints
Liveness Probe
Readiness Probe
Startup Probe (Slow-Start Apps)
Custom Health Indicators
Graceful Shutdown
Common Pitfalls

Probe	Failure Action	Purpose
Liveness	Restart the container	Detect deadlocks or stuck processes that can only recover via restart
Readiness	Remove pod from Service endpoints (stop sending traffic)	Detect when the app is temporarily unable to serve requests (e.g., DB connection lost)
Startup	Restart the container (if still failing after deadline)	Give slow-starting apps time to initialise without triggering liveness restarts

Each probe can use one of three mechanisms: httpGet, tcpSocket, or exec (command).

Spring Boot 2.3+ automatically exposes separate /actuator/health/liveness and /actuator/health/readiness endpoints when running on Kubernetes.

# application.yml management: endpoints: web: exposure: include: health,info,prometheus endpoint: health: probes: enabled: true # enables /health/liveness and /health/readiness show-details: always # optional — shows component details group: readiness: include: readinessState,db,redis # custom readiness components liveness: include: livenessState,diskSpace

# Verify endpoints curl http://localhost:8080/actuator/health/liveness # {"status":"UP"} curl http://localhost:8080/actuator/health/readiness # {"status":"UP","components":{"db":{"status":"UP"},"redis":{"status":"UP"}}}

The YAML below shows the complete configuration for this feature. Adjust the values to match your environment.

livenessProbe: httpGet: path: /actuator/health/liveness port: 8080 # Don't start checking until the app has had time to start initialDelaySeconds: 30 # How often to check periodSeconds: 15 # Timeout per check timeoutSeconds: 5 # Consecutive successes to become healthy (must be 1 for liveness) successThreshold: 1 # Consecutive failures before restarting the container failureThreshold: 3 # Total liveness budget: 30 + (15 × 3) = 75 seconds before first restart

Liveness should only check that the process can respond — not database connectivity. A database outage should make the pod not ready, not restart it. Restarting won't fix a DB that's down, and cascading restarts will make recovery worse.

The YAML below shows the complete configuration for this feature. Adjust the values to match your environment.

readinessProbe: httpGet: path: /actuator/health/readiness port: 8080 initialDelaySeconds: 20 # less than liveness — check readiness sooner periodSeconds: 10 timeoutSeconds: 5 successThreshold: 1 failureThreshold: 3 # remove from Service after 3 failures (30 seconds)

When a readiness probe fails, the pod is removed from the Service's endpoint list — no new requests are routed to it. The pod is not restarted. Once the dependency recovers, the probe passes and traffic resumes automatically.

Apps with long startup times (Spring Batch jobs, apps with many Flyway migrations) need a startup probe to prevent liveness from killing them before they finish initialising.

startupProbe: httpGet: path: /actuator/health/liveness port: 8080 # failureThreshold × periodSeconds = maximum startup time allowed failureThreshold: 30 # 30 attempts periodSeconds: 10 # = 300 seconds (5 minutes) before giving up timeoutSeconds: 5 # While startupProbe is running, liveness and readiness probes are DISABLED. # Once startupProbe succeeds once, normal probes take over. livenessProbe: httpGet: path: /actuator/health/liveness port: 8080 periodSeconds: 15 failureThreshold: 3

The class below shows the implementation. Key points are highlighted in the inline comments.

import org.springframework.boot.actuate.health.*; import org.springframework.stereotype.Component; // Custom health indicator — contributes to /actuator/health/readiness @Component public class ExternalServiceHealthIndicator implements HealthIndicator { private final ExternalPaymentGateway gateway; @Override public Health health() { try { boolean reachable = gateway.ping(); // lightweight check if (reachable) { return Health.up() .withDetail("gateway", "reachable") .build(); } return Health.down() .withDetail("gateway", "unreachable") .build(); } catch (Exception e) { return Health.down(e).build(); } } }

// Programmatically signal readiness (e.g., after cache warm-up completes) @Component public class CacheWarmupListener implements ApplicationListener<ApplicationReadyEvent> { private final ReadinessStateExporter readinessStateExporter; @Override public void onApplicationEvent(ApplicationReadyEvent event) { // Warm up caches... warmUpCaches(); // Signal Kubernetes the pod is ready to receive traffic readinessStateExporter.onStateChange(ReadinessState.ACCEPTING_TRAFFIC); } }

When Kubernetes terminates a pod it sends SIGTERM. Spring Boot 2.3+ supports graceful shutdown — it stops accepting new requests and waits for in-flight requests to complete before exiting.

# application.yml — enable graceful shutdown server: shutdown: graceful # wait for in-flight requests to complete spring: lifecycle: timeout-per-shutdown-phase: 30s # max wait time per phase

# Deployment spec — full graceful shutdown config spec: template: spec: terminationGracePeriodSeconds: 60 # K8s waits this long before SIGKILL containers: - name: payment-service lifecycle: preStop: exec: # Sleep briefly before SIGTERM so load balancer has time # to stop routing new requests to this pod command: ["/bin/sh", "-c", "sleep 10"]

The full shutdown sequence:

Pod is marked Terminating — removed from Service endpoints (traffic stops).
preStop hook executes — sleep 10 ensures the endpoint removal has propagated to all kube-proxies before SIGTERM.
SIGTERM sent to the container process — Spring Boot begins graceful shutdown.
Spring Boot drains in-flight requests (up to timeout-per-shutdown-phase).
If the process hasn't exited within terminationGracePeriodSeconds, Kubernetes sends SIGKILL.

Set terminationGracePeriodSeconds to at least preStop sleep + timeout-per-shutdown-phase + 10s buffer. A common mistake is setting the Spring Boot timeout longer than the K8s grace period — the JVM gets SIGKILL before it finishes draining.

Liveness checks a dependency (DB, Redis) — causes restart loops when a dependency is down. Keep liveness simple.
No startup probe for slow apps — liveness kills the app before it finishes Flyway migrations. Add a startup probe with failureThreshold × period ≥ max startup time.
initialDelaySeconds too short — liveness fires before Spring context is loaded, causing unnecessary restart on first boot.
No preStop sleep — requests are still routed to the pod for a few seconds after SIGTERM arrives (endpoint propagation delay). The preStop sleep prevents 5xx errors during rolling deploys.
terminationGracePeriodSeconds smaller than shutdown timeout — the JVM gets SIGKILL before draining completes. Always: terminationGracePeriod > preStop + Spring shutdown timeout.

Contents