Contents

The simplest way to start is the JMH Maven archetype, which generates a complete project with the right shade-plugin configuration to produce a fat JAR. Alternatively add the dependency to an existing project.

# Generate a new JMH project from archetype mvn archetype:generate \ -DinteractiveMode=false \ -DarchetypeGroupId=org.openjdk.jmh \ -DarchetypeArtifactId=jmh-java-benchmark-archetype \ -DarchetypeVersion=1.37 \ -DgroupId=io.cscode \ -DartifactId=my-benchmarks \ -Dversion=1.0-SNAPSHOT <!-- Add to an existing pom.xml --> <dependency> <groupId>org.openjdk.jmh</groupId> <artifactId>jmh-core</artifactId> <version>1.37</version> </dependency> <dependency> <groupId>org.openjdk.jmh</groupId> <artifactId>jmh-generator-annprocess</artifactId> <version>1.37</version> <scope>provided</scope> </dependency> # Run all benchmarks after building the fat JAR mvn clean package -q java -jar target/benchmarks.jar # Run a specific benchmark class java -jar target/benchmarks.jar StringBenchmark # Quick mode — 1 fork, fewer iterations (development only) java -jar target/benchmarks.jar -f 1 -wi 2 -i 3

A JMH benchmark is a regular Java class with methods annotated @Benchmark. JMH generates boilerplate code at compile time (via annotation processing) that handles warm-up loops, measurement loops, and result collection. The class must be public and the benchmark method must be public and non-static.

package io.cscode; import org.openjdk.jmh.annotations.*; import org.openjdk.jmh.runner.Runner; import org.openjdk.jmh.runner.options.*; import java.util.concurrent.TimeUnit; @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @Warmup(iterations = 3, time = 1) @Measurement(iterations = 5, time = 1) @Fork(2) public class StringBenchmark { @Benchmark public String concatenation() { // Measure time of String + operator return "Hello, " + "World" + "!"; } @Benchmark public String builderAppend() { return new StringBuilder() .append("Hello, ") .append("World") .append("!") .toString(); } public static void main(String[] args) throws Exception { Options opt = new OptionsBuilder() .include(StringBenchmark.class.getSimpleName()) .forks(1) .build(); new Runner(opt).run(); } } Always annotate the class with @Fork, @Warmup, and @Measurement — or provide them via CLI. Without these, JMH uses defaults that may not be appropriate for your use case.

JMH supports four primary measurement modes. Choosing the right one determines what the numbers mean — always match the mode to the question you are answering.

ModeAnnotationOutputUse when you want to know…
ThroughputMode.Throughputops/secondHow many operations per second can this code sustain?
AverageTimeMode.AverageTimetime/opWhat is the average cost of one operation?
SampleTimeMode.SampleTimehistogram + percentilesWhat is the latency distribution (p50, p95, p99)?
SingleShotTimeMode.SingleShotTimeone-shot timeCold-start / first-invocation cost (no warm-up)
AllMode.Allall of the aboveExploratory comparison — expensive but complete
// Measure p99 latency distribution with SampleTime @BenchmarkMode(Mode.SampleTime) @OutputTimeUnit(TimeUnit.MICROSECONDS) public class LatencyBenchmark { @Benchmark public byte[] hashPassword() { return MessageDigest.getInstance("SHA-256") .digest("secret".getBytes(StandardCharsets.UTF_8)); } }

@State objects hold benchmark inputs and shared resources. JMH creates and manages state instances according to the Scope you choose. Without @State, every benchmark method must create its own data — which benchmarks the allocation, not the logic you care about.

ScopeLifetimeUse for
Scope.BenchmarkShared across all threads in all forksImmutable inputs, large datasets loaded once
Scope.ThreadOne instance per worker threadThreadLocal-like state, mutable per-thread counters
Scope.GroupShared within a thread groupProducer-consumer benchmarks
@State(Scope.Benchmark) public class MapState { public Map<String, Integer> map; public List<String> keys; @Setup(Level.Trial) public void prepare() { map = new HashMap<>(); keys = new ArrayList<>(); for (int i = 0; i < 100_000; i++) { String key = "key-" + i; map.put(key, i); keys.add(key); } } } @BenchmarkMode(Mode.Throughput) public class MapLookupBenchmark { @Benchmark public Integer hashMapGet(MapState state, Blackhole bh) { // Benchmark lookup, not map creation String key = state.keys.get(ThreadLocalRandom.current().nextInt(state.keys.size())); return state.map.get(key); } }

@Setup and @TearDown methods run outside the measured region and control the lifecycle of @State objects. The Level parameter controls how often they run.

Level@Setup runs…@TearDown runs…
Level.TrialOnce before the entire benchmark run (all forks combined)Once after all forks
Level.IterationBefore each measurement/warmup iterationAfter each iteration
Level.InvocationBefore each single method callAfter each single call (⚠️ expensive)
@State(Scope.Thread) public class ConnectionState { Connection conn; @Setup(Level.Trial) public void openConnection() throws Exception { conn = DriverManager.getConnection("jdbc:h2:mem:bench"); } @TearDown(Level.Trial) public void closeConnection() throws Exception { if (conn != null) conn.close(); } } Level.Invocation runs setup/teardown on every single benchmark call. If your method takes nanoseconds, the setup overhead will dominate the measurement. Use Level.Invocation only when per-call reset is essential (e.g., benchmarking sort on a freshly unsorted array).

@Param lets you run the same benchmark across multiple input values, producing a result matrix. JMH automatically cross-products all @Param fields and runs a separate benchmark for each combination. This is far cleaner than copy-pasting benchmark methods.

@State(Scope.Benchmark) public class SortBenchmark { @Param({"100", "1000", "10000", "100000"}) public int size; @Param({"random", "sorted", "reverse"}) public String order; int[] data; @Setup(Level.Invocation) public void prepare() { data = new int[size]; switch (order) { case "random" -> { for (int i = 0; i < size; i++) data[i] = ThreadLocalRandom.current().nextInt(); } case "sorted" -> { for (int i = 0; i < size; i++) data[i] = i; } case "reverse" -> { for (int i = 0; i < size; i++) data[i] = size - i; } } } @Benchmark public void arraySort() { Arrays.sort(data); } } // Produces 4 × 3 = 12 benchmark rows in the results table

The JIT compiler is very good at eliminating code whose result is never used. If your benchmark computes a value and discards it, the JIT may simply remove the computation entirely, leaving you measuring nothing. A Blackhole is an opaque consumer that prevents this optimisation without introducing significant overhead itself.

@Benchmark // BAD — JIT may eliminate the entire sin() call because result is unused public void badSin() { Math.sin(1.5); } @Benchmark // GOOD — return value forces JIT to keep the computation public double returnSin() { return Math.sin(1.5); } @Benchmark // GOOD — explicit Blackhole when you compute multiple intermediate values public void multipleValues(Blackhole bh) { double a = Math.sin(1.5); double b = Math.cos(1.5); bh.consume(a); bh.consume(b); } Returning a value from a @Benchmark method is equivalent to passing it to a Blackhole. Use Blackhole explicitly only when you compute multiple values and need to consume each one.

These three annotations control how long JMH runs your benchmark and how many separate JVM processes it uses. Running in multiple forks (separate JVMs) eliminates cross-benchmark JIT pollution — profiling from one benchmark can affect JIT decisions for another if they share a JVM.

@Fork(value = 3, jvmArgs = {"-Xms512m", "-Xmx512m", "-XX:+UseG1GC"}) @Warmup(iterations = 5, time = 2, timeUnit = TimeUnit.SECONDS) @Measurement(iterations = 10, time = 2, timeUnit = TimeUnit.SECONDS) @BenchmarkMode(Mode.Throughput) @OutputTimeUnit(TimeUnit.MILLISECONDS) public class ProductionBenchmark { @Benchmark public int sumStream(Blackhole bh) { return IntStream.rangeClosed(1, 1_000).sum(); } @Benchmark public int sumLoop() { int sum = 0; for (int i = 1; i <= 1_000; i++) sum += i; return sum; } }

Most unreliable benchmarks fall into one of these traps. Recognising them saves hours of debugging confusing results.

PitfallSymptomFix
Dead-code eliminationBenchmark runs in 0 ns; JIT removed the callReturn the value or use Blackhole.consume()
Constant foldingResult is always the same — JIT computes it at compile timeFeed input from @State fields instead of literals
Insufficient warm-upHigh variance between early and late iterationsIncrease @Warmup iterations/time
Single forkResults vary wildly across runsUse @Fork(3) minimum for publishable numbers
Benchmarking allocationYou think you benchmark logic but you measure newCreate inputs in @Setup, not inside @Benchmark
Shared mutable stateNon-repeatable results depending on execution orderUse correct @State(Scope); reset in @Setup(Level.Invocation)
Running in IDEJVM flags differ; results don't match fat-JAR runAlways benchmark with java -jar benchmarks.jar
System load interferenceWildly different results at different times of dayDedicate a machine or CI node; disable Turbo Boost on laptops