Spring Boot + CRaC — Coordinated Restore at Checkpoint

How CRaC Works
Dependency & CRaC JDK
Taking a Checkpoint
Custom Resource Hooks
Spring Boot Auto-Configuration for CRaC
Docker Integration
Kubernetes Deployment
CRaC vs GraalVM vs SnapStart

CRaC uses Linux CRIU (Checkpoint/Restore In Userspace) at its core. The lifecycle has two phases:

Checkpoint phase — the application starts normally, loads classes, wires Spring context, runs JIT compilation to reach peak performance. At a designated point (triggered via JVM API or signal), CRIU freezes all threads and writes the entire process state — heap, stack, file descriptors, sockets, JIT-compiled code — to image files on disk. The process then exits.
Restore phase — at deployment time, CRIU reads the image files and reconstructs the process in memory. The JVM resumes from exactly where it left off. Startup time is typically 50–200 ms because class loading, JIT compilation, and Spring context initialisation are already baked into the image.

Aspect	Normal JVM startup	CRaC restore
Class loading	✖ Done at startup	✔ Already in image
Spring context init	✖ Done at startup	✔ Already in image
JIT warm-up	✖ Happens gradually under load	✔ Pre-compiled code in image
File descriptors / sockets	✔ Fresh connections	✖ Must be recreated on restore
Startup latency	5–15 seconds	50–200 ms

Unlike GraalVM Native Image, CRaC works with any JVM language and framework — no compilation changes, no reflection configuration, no build-time class analysis. The trade-off is the extra checkpoint step in the build pipeline.

CRaC requires a CRaC-enabled JDK (Azul Zulu builds for Linux x64 are the most commonly used) and a single Spring Boot dependency that activates the auto-configuration hooks.

<dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> </dependency>  <dependency> <groupId>io.github.crac</groupId> <artifactId>org-crac</artifactId> <version>0.1.3</version> </dependency>

# Download Azul Zulu JDK with CRaC support (Linux x64) curl -L -o zulu-crac.tar.gz \ https://cdn.azul.com/zulu/bin/zulu21.36.17-ca-crac-jdk21.0.4-linux_x64.tar.gz tar xzf zulu-crac.tar.gz # Verify CRaC support ./zulu21-crac/bin/java -XX:CRaCCheckpointTo=/tmp/test -version

A checkpoint can be triggered in two ways: programmatically via the CRaC API inside the application, or externally via the jcmd tool or a UNIX signal. The programmatic approach is simpler and reproducible in build pipelines.

# Run the application with checkpoint output directory specified java -XX:CRaCCheckpointTo=/path/to/checkpoint \ -jar target/myapp.jar & # Wait for application to fully start and reach steady state sleep 10 # Trigger checkpoint externally using jcmd jcmd $(jps -q) JDK.checkpoint # The JVM writes checkpoint files and exits # /path/to/checkpoint/ now contains core image files

// Programmatic checkpoint — trigger after readiness probe passes @SpringBootApplication public class MyApp implements CommandLineRunner { @Override public void run(String... args) throws Exception { // App is fully initialised here — good time to checkpoint if (Arrays.asList(args).contains("--checkpoint")) { log.info("Taking CRaC checkpoint..."); Core.getGlobalContext().checkpointRestore(); // Execution resumes here after restore log.info("Restored from checkpoint!"); } } }

# Restore from checkpoint — instant startup java -XX:CRaCRestoreFrom=/path/to/checkpoint

File descriptors, network connections, and any time-sensitive state become stale after restore. Register a Resource hook to close these before checkpoint and re-open them after restore. Spring Boot 3.2+ does this automatically for JDBC pools, caches, and embedded servers.

import org.crac.Context; import org.crac.Core; import org.crac.Resource; @Component public class RedisConnectionHook implements Resource { private final RedisConnectionFactory connectionFactory; public RedisConnectionHook(RedisConnectionFactory connectionFactory) { this.connectionFactory = connectionFactory; // Register with the global CRaC context Core.getGlobalContext().register(this); } @Override public void beforeCheckpoint(Context<? extends Resource> context) { // Called just before checkpoint — close connections so they aren't captured in image log.info("CRaC beforeCheckpoint: closing Redis connections"); connectionFactory.destroy(); // returns connections to pool and closes idle ones } @Override public void afterRestore(Context<? extends Resource> context) { // Called immediately after restore — re-establish connections with fresh sockets log.info("CRaC afterRestore: reconnecting to Redis"); connectionFactory.getConnection(); // triggers pool initialisation } }

Spring Boot 3.2+ ships spring-boot-autoconfigure with CRaC-aware lifecycle management. When the CRaC JDK is detected at runtime, Spring automatically registers hooks for the following components — no custom code needed.

Component	beforeCheckpoint action	afterRestore action
HikariCP connection pool	Evict all connections (closes idle + active)	Refill pool with fresh connections
Embedded Tomcat / Netty	Pause acceptors (stop accepting new connections)	Resume acceptors
Spring Cache (Caffeine)	Clear caches (stale data may be wrong after restore)	Caches refill on demand
Scheduled tasks	Cancel pending scheduled executions	Reschedule tasks

# application.yml — no CRaC-specific properties needed for built-in components # Spring Boot detects the CRaC JDK automatically via Runtime.version() spring: datasource: hikari: max-lifetime: 1800000 # 30 min — ensure connections refresh before pool cycle lifecycle: timeout-per-shutdown-phase: 30s

The standard pattern is a two-stage Docker build: a "checkpoint" stage runs the app and takes a snapshot; a "runtime" stage copies the snapshot into a lean image. The runtime image restores from the checkpoint instead of starting the JVM fresh.

# Stage 1: build the JAR FROM maven:3.9-eclipse-temurin-21 AS build WORKDIR /app COPY pom.xml . RUN mvn dependency:go-offline -q COPY src ./src RUN mvn package -DskipTests -q # Stage 2: create CRaC checkpoint inside a privileged container FROM azul/zulu-openjdk-debian:21-crac AS checkpoint WORKDIR /app COPY --from=build /app/target/myapp.jar . # Run app, wait for it to start, trigger checkpoint, capture image RUN java -XX:CRaCCheckpointTo=/checkpoint -jar myapp.jar & \ sleep 15 && \ jcmd $(jps -q -J-XX:+UseG1GC myapp) JDK.checkpoint && \ wait # Stage 3: lean runtime image with checkpoint baked in FROM azul/zulu-openjdk-debian:21-crac WORKDIR /app COPY --from=checkpoint /checkpoint /checkpoint CMD ["java", "-XX:CRaCRestoreFrom=/checkpoint"]

The checkpoint stage requires --privileged to allow CRIU to use Linux kernel capabilities (CAP_SYS_PTRACE). Run this stage in a trusted CI/CD environment — not on a shared runner. The resulting runtime image does NOT need privileged mode.

CRaC-restored pods start in under 200 ms, making them ideal for Kubernetes horizontal pod autoscaling. New replicas become ready almost instantly during traffic spikes. The runtime container does not need elevated privileges — only the checkpoint creation step does.

apiVersion: apps/v1 kind: Deployment metadata: name: product-service spec: replicas: 3 template: spec: containers: - name: product-service image: myregistry/product-service:crac-1.0 # No privileged: true needed for restore — only for checkpoint creation resources: requests: memory: "512Mi" cpu: "250m" limits: memory: "1Gi" cpu: "1" readinessProbe: httpGet: path: /actuator/health/readiness port: 8080 initialDelaySeconds: 1 # CRaC: ready almost immediately periodSeconds: 2 livenessProbe: httpGet: path: /actuator/health/liveness port: 8080 initialDelaySeconds: 2 periodSeconds: 10 --- # HPA — CRaC makes scale-out fast enough for bursty traffic apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: product-service-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: product-service minReplicas: 2 maxReplicas: 20 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 60

CRaC is one of three approaches to solving the JVM cold start problem. Each makes different trade-offs.

Aspect	CRaC	GraalVM Native Image	AWS SnapStart
Cold start	50–200 ms	10–100 ms	<1 s
Peak throughput	Full JIT — same as JVM	Lower (AOT, no JIT)	Full JIT
Build complexity	Medium (checkpoint step)	High (native build, config)	Low (SAM flag)
Framework compatibility	High — any JVM code	Medium — reflection/proxies need config	High — any JVM code
OS requirement	Linux only (CRIU)	Linux, macOS, Windows	Lambda managed runtime (Linux)
Cloud portability	Any cloud / on-prem	Any cloud / on-prem	AWS Lambda only
Best for	K8s auto-scaling, fast rollouts	Ultra-low memory, CLI tools	Lambda serverless only

Contents