Contents
- Probe Types Overview
- Spring Boot Actuator Health Endpoints
- Liveness Probe
- Readiness Probe
- Startup Probe (Slow-Start Apps)
- Custom Health Indicators
- Graceful Shutdown
- Common Pitfalls
| Probe | Failure Action | Purpose |
| Liveness | Restart the container | Detect deadlocks or stuck processes that can only recover via restart |
| Readiness | Remove pod from Service endpoints (stop sending traffic) | Detect when the app is temporarily unable to serve requests (e.g., DB connection lost) |
| Startup | Restart the container (if still failing after deadline) | Give slow-starting apps time to initialise without triggering liveness restarts |
Each probe can use one of three mechanisms: httpGet, tcpSocket, or exec (command).
Spring Boot 2.3+ automatically exposes separate /actuator/health/liveness and /actuator/health/readiness endpoints when running on Kubernetes.
# application.yml
management:
endpoints:
web:
exposure:
include: health,info,prometheus
endpoint:
health:
probes:
enabled: true # enables /health/liveness and /health/readiness
show-details: always # optional — shows component details
group:
readiness:
include: readinessState,db,redis # custom readiness components
liveness:
include: livenessState,diskSpace
# Verify endpoints
curl http://localhost:8080/actuator/health/liveness
# {"status":"UP"}
curl http://localhost:8080/actuator/health/readiness
# {"status":"UP","components":{"db":{"status":"UP"},"redis":{"status":"UP"}}}
The YAML below shows the complete configuration for this feature. Adjust the values to match your environment.
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
# Don't start checking until the app has had time to start
initialDelaySeconds: 30
# How often to check
periodSeconds: 15
# Timeout per check
timeoutSeconds: 5
# Consecutive successes to become healthy (must be 1 for liveness)
successThreshold: 1
# Consecutive failures before restarting the container
failureThreshold: 3
# Total liveness budget: 30 + (15 × 3) = 75 seconds before first restart
Liveness should only check that the process can respond — not database connectivity. A database outage should make the pod not ready, not restart it. Restarting won't fix a DB that's down, and cascading restarts will make recovery worse.
The YAML below shows the complete configuration for this feature. Adjust the values to match your environment.
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
initialDelaySeconds: 20 # less than liveness — check readiness sooner
periodSeconds: 10
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 3 # remove from Service after 3 failures (30 seconds)
When a readiness probe fails, the pod is removed from the Service's endpoint list — no new requests are routed to it. The pod is not restarted. Once the dependency recovers, the probe passes and traffic resumes automatically.
Apps with long startup times (Spring Batch jobs, apps with many Flyway migrations) need a startup probe to prevent liveness from killing them before they finish initialising.
startupProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
# failureThreshold × periodSeconds = maximum startup time allowed
failureThreshold: 30 # 30 attempts
periodSeconds: 10 # = 300 seconds (5 minutes) before giving up
timeoutSeconds: 5
# While startupProbe is running, liveness and readiness probes are DISABLED.
# Once startupProbe succeeds once, normal probes take over.
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
periodSeconds: 15
failureThreshold: 3
The class below shows the implementation. Key points are highlighted in the inline comments.
import org.springframework.boot.actuate.health.*;
import org.springframework.stereotype.Component;
// Custom health indicator — contributes to /actuator/health/readiness
@Component
public class ExternalServiceHealthIndicator implements HealthIndicator {
private final ExternalPaymentGateway gateway;
@Override
public Health health() {
try {
boolean reachable = gateway.ping(); // lightweight check
if (reachable) {
return Health.up()
.withDetail("gateway", "reachable")
.build();
}
return Health.down()
.withDetail("gateway", "unreachable")
.build();
} catch (Exception e) {
return Health.down(e).build();
}
}
}
// Programmatically signal readiness (e.g., after cache warm-up completes)
@Component
public class CacheWarmupListener implements ApplicationListener<ApplicationReadyEvent> {
private final ReadinessStateExporter readinessStateExporter;
@Override
public void onApplicationEvent(ApplicationReadyEvent event) {
// Warm up caches...
warmUpCaches();
// Signal Kubernetes the pod is ready to receive traffic
readinessStateExporter.onStateChange(ReadinessState.ACCEPTING_TRAFFIC);
}
}
When Kubernetes terminates a pod it sends SIGTERM. Spring Boot 2.3+ supports graceful shutdown — it stops accepting new requests and waits for in-flight requests to complete before exiting.
# application.yml — enable graceful shutdown
server:
shutdown: graceful # wait for in-flight requests to complete
spring:
lifecycle:
timeout-per-shutdown-phase: 30s # max wait time per phase
# Deployment spec — full graceful shutdown config
spec:
template:
spec:
terminationGracePeriodSeconds: 60 # K8s waits this long before SIGKILL
containers:
- name: payment-service
lifecycle:
preStop:
exec:
# Sleep briefly before SIGTERM so load balancer has time
# to stop routing new requests to this pod
command: ["/bin/sh", "-c", "sleep 10"]
The full shutdown sequence:
- Pod is marked Terminating — removed from Service endpoints (traffic stops).
- preStop hook executes — sleep 10 ensures the endpoint removal has propagated to all kube-proxies before SIGTERM.
- SIGTERM sent to the container process — Spring Boot begins graceful shutdown.
- Spring Boot drains in-flight requests (up to timeout-per-shutdown-phase).
- If the process hasn't exited within terminationGracePeriodSeconds, Kubernetes sends SIGKILL.
Set terminationGracePeriodSeconds to at least preStop sleep + timeout-per-shutdown-phase + 10s buffer. A common mistake is setting the Spring Boot timeout longer than the K8s grace period — the JVM gets SIGKILL before it finishes draining.
- Liveness checks a dependency (DB, Redis) — causes restart loops when a dependency is down. Keep liveness simple.
- No startup probe for slow apps — liveness kills the app before it finishes Flyway migrations. Add a startup probe with failureThreshold × period ≥ max startup time.
- initialDelaySeconds too short — liveness fires before Spring context is loaded, causing unnecessary restart on first boot.
- No preStop sleep — requests are still routed to the pod for a few seconds after SIGTERM arrives (endpoint propagation delay). The preStop sleep prevents 5xx errors during rolling deploys.
- terminationGracePeriodSeconds smaller than shutdown timeout — the JVM gets SIGKILL before draining completes. Always: terminationGracePeriod > preStop + Spring shutdown timeout.