Java Microbenchmarking with JMH — Modes, Pitfalls & Best Practices

Maven / Gradle Setup
Your First Benchmark
Benchmark Modes
@State — Sharing Data Between Iterations
@Setup and @TearDown
@Param — Parameterised Benchmarks
Blackhole — Preventing Dead-Code Elimination
@Fork, @Warmup & @Measurement
Common Pitfalls

The simplest way to start is the JMH Maven archetype, which generates a complete project with the right shade-plugin configuration to produce a fat JAR. Alternatively add the dependency to an existing project.

# Generate a new JMH project from archetype mvn archetype:generate \ -DinteractiveMode=false \ -DarchetypeGroupId=org.openjdk.jmh \ -DarchetypeArtifactId=jmh-java-benchmark-archetype \ -DarchetypeVersion=1.37 \ -DgroupId=io.cscode \ -DartifactId=my-benchmarks \ -Dversion=1.0-SNAPSHOT

<dependency> <groupId>org.openjdk.jmh</groupId> <artifactId>jmh-core</artifactId> <version>1.37</version> </dependency> <dependency> <groupId>org.openjdk.jmh</groupId> <artifactId>jmh-generator-annprocess</artifactId> <version>1.37</version> <scope>provided</scope> </dependency>

# Run all benchmarks after building the fat JAR mvn clean package -q java -jar target/benchmarks.jar # Run a specific benchmark class java -jar target/benchmarks.jar StringBenchmark # Quick mode — 1 fork, fewer iterations (development only) java -jar target/benchmarks.jar -f 1 -wi 2 -i 3

A JMH benchmark is a regular Java class with methods annotated @Benchmark. JMH generates boilerplate code at compile time (via annotation processing) that handles warm-up loops, measurement loops, and result collection. The class must be public and the benchmark method must be public and non-static.

package io.cscode; import org.openjdk.jmh.annotations.*; import org.openjdk.jmh.runner.Runner; import org.openjdk.jmh.runner.options.*; import java.util.concurrent.TimeUnit; @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @Warmup(iterations = 3, time = 1) @Measurement(iterations = 5, time = 1) @Fork(2) public class StringBenchmark { @Benchmark public String concatenation() { // Measure time of String + operator return "Hello, " + "World" + "!"; } @Benchmark public String builderAppend() { return new StringBuilder() .append("Hello, ") .append("World") .append("!") .toString(); } public static void main(String[] args) throws Exception { Options opt = new OptionsBuilder() .include(StringBenchmark.class.getSimpleName()) .forks(1) .build(); new Runner(opt).run(); } }

Always annotate the class with @Fork, @Warmup, and @Measurement — or provide them via CLI. Without these, JMH uses defaults that may not be appropriate for your use case.

JMH supports four primary measurement modes. Choosing the right one determines what the numbers mean — always match the mode to the question you are answering.

Mode	Annotation	Output	Use when you want to know…
Throughput	Mode.Throughput	ops/second	How many operations per second can this code sustain?
AverageTime	Mode.AverageTime	time/op	What is the average cost of one operation?
SampleTime	Mode.SampleTime	histogram + percentiles	What is the latency distribution (p50, p95, p99)?
SingleShotTime	Mode.SingleShotTime	one-shot time	Cold-start / first-invocation cost (no warm-up)
All	Mode.All	all of the above	Exploratory comparison — expensive but complete

// Measure p99 latency distribution with SampleTime @BenchmarkMode(Mode.SampleTime) @OutputTimeUnit(TimeUnit.MICROSECONDS) public class LatencyBenchmark { @Benchmark public byte[] hashPassword() { return MessageDigest.getInstance("SHA-256") .digest("secret".getBytes(StandardCharsets.UTF_8)); } }

@State objects hold benchmark inputs and shared resources. JMH creates and manages state instances according to the Scope you choose. Without @State, every benchmark method must create its own data — which benchmarks the allocation, not the logic you care about.

Scope	Lifetime	Use for
Scope.Benchmark	Shared across all threads in all forks	Immutable inputs, large datasets loaded once
Scope.Thread	One instance per worker thread	ThreadLocal-like state, mutable per-thread counters
Scope.Group	Shared within a thread group	Producer-consumer benchmarks

@State(Scope.Benchmark) public class MapState { public Map<String, Integer> map; public List<String> keys; @Setup(Level.Trial) public void prepare() { map = new HashMap<>(); keys = new ArrayList<>(); for (int i = 0; i < 100_000; i++) { String key = "key-" + i; map.put(key, i); keys.add(key); } } } @BenchmarkMode(Mode.Throughput) public class MapLookupBenchmark { @Benchmark public Integer hashMapGet(MapState state, Blackhole bh) { // Benchmark lookup, not map creation String key = state.keys.get(ThreadLocalRandom.current().nextInt(state.keys.size())); return state.map.get(key); } }

@Setup and @TearDown methods run outside the measured region and control the lifecycle of @State objects. The Level parameter controls how often they run.

Level	@Setup runs…	@TearDown runs…
Level.Trial	Once before the entire benchmark run (all forks combined)	Once after all forks
Level.Iteration	Before each measurement/warmup iteration	After each iteration
Level.Invocation	Before each single method call	After each single call (⚠️ expensive)

@State(Scope.Thread) public class ConnectionState { Connection conn; @Setup(Level.Trial) public void openConnection() throws Exception { conn = DriverManager.getConnection("jdbc:h2:mem:bench"); } @TearDown(Level.Trial) public void closeConnection() throws Exception { if (conn != null) conn.close(); } }

Level.Invocation runs setup/teardown on every single benchmark call. If your method takes nanoseconds, the setup overhead will dominate the measurement. Use Level.Invocation only when per-call reset is essential (e.g., benchmarking sort on a freshly unsorted array).

@Param lets you run the same benchmark across multiple input values, producing a result matrix. JMH automatically cross-products all @Param fields and runs a separate benchmark for each combination. This is far cleaner than copy-pasting benchmark methods.

@State(Scope.Benchmark) public class SortBenchmark { @Param({"100", "1000", "10000", "100000"}) public int size; @Param({"random", "sorted", "reverse"}) public String order; int[] data; @Setup(Level.Invocation) public void prepare() { data = new int[size]; switch (order) { case "random" -> { for (int i = 0; i < size; i++) data[i] = ThreadLocalRandom.current().nextInt(); } case "sorted" -> { for (int i = 0; i < size; i++) data[i] = i; } case "reverse" -> { for (int i = 0; i < size; i++) data[i] = size - i; } } } @Benchmark public void arraySort() { Arrays.sort(data); } } // Produces 4 × 3 = 12 benchmark rows in the results table

The JIT compiler is very good at eliminating code whose result is never used. If your benchmark computes a value and discards it, the JIT may simply remove the computation entirely, leaving you measuring nothing. A Blackhole is an opaque consumer that prevents this optimisation without introducing significant overhead itself.

@Benchmark // BAD — JIT may eliminate the entire sin() call because result is unused public void badSin() { Math.sin(1.5); } @Benchmark // GOOD — return value forces JIT to keep the computation public double returnSin() { return Math.sin(1.5); } @Benchmark // GOOD — explicit Blackhole when you compute multiple intermediate values public void multipleValues(Blackhole bh) { double a = Math.sin(1.5); double b = Math.cos(1.5); bh.consume(a); bh.consume(b); }

Returning a value from a @Benchmark method is equivalent to passing it to a Blackhole. Use Blackhole explicitly only when you compute multiple values and need to consume each one.

These three annotations control how long JMH runs your benchmark and how many separate JVM processes it uses. Running in multiple forks (separate JVMs) eliminates cross-benchmark JIT pollution — profiling from one benchmark can affect JIT decisions for another if they share a JVM.

@Fork(N) — run the benchmark in N separate JVM processes. Results are averaged across forks. Use @Fork(1) only for quick development checks; production benchmarks need @Fork(3+).
@Warmup(iterations=N, time=T) — run N warm-up iterations of T seconds each before measuring. Warm-up allows the JIT to reach steady state. 3–5 iterations of 1–2 seconds is usually enough.
@Measurement(iterations=N, time=T) — actual measurement iterations. More iterations reduce variance. 5 iterations of 1–2 seconds is a good default.

@Fork(value = 3, jvmArgs = {"-Xms512m", "-Xmx512m", "-XX:+UseG1GC"}) @Warmup(iterations = 5, time = 2, timeUnit = TimeUnit.SECONDS) @Measurement(iterations = 10, time = 2, timeUnit = TimeUnit.SECONDS) @BenchmarkMode(Mode.Throughput) @OutputTimeUnit(TimeUnit.MILLISECONDS) public class ProductionBenchmark { @Benchmark public int sumStream(Blackhole bh) { return IntStream.rangeClosed(1, 1_000).sum(); } @Benchmark public int sumLoop() { int sum = 0; for (int i = 1; i <= 1_000; i++) sum += i; return sum; } }

Most unreliable benchmarks fall into one of these traps. Recognising them saves hours of debugging confusing results.

Pitfall	Symptom	Fix
Dead-code elimination	Benchmark runs in 0 ns; JIT removed the call	Return the value or use Blackhole.consume()
Constant folding	Result is always the same — JIT computes it at compile time	Feed input from @State fields instead of literals
Insufficient warm-up	High variance between early and late iterations	Increase @Warmup iterations/time
Single fork	Results vary wildly across runs	Use @Fork(3) minimum for publishable numbers
Benchmarking allocation	You think you benchmark logic but you measure new	Create inputs in @Setup, not inside @Benchmark
Shared mutable state	Non-repeatable results depending on execution order	Use correct @State(Scope); reset in @Setup(Level.Invocation)
Running in IDE	JVM flags differ; results don't match fat-JAR run	Always benchmark with java -jar benchmarks.jar
System load interference	Wildly different results at different times of day	Dedicate a machine or CI node; disable Turbo Boost on laptops

Contents