Contents
- Maven / Gradle Setup
- Your First Benchmark
- Benchmark Modes
- @State — Sharing Data Between Iterations
- @Setup and @TearDown
- @Param — Parameterised Benchmarks
- Blackhole — Preventing Dead-Code Elimination
- @Fork, @Warmup & @Measurement
- Common Pitfalls
The simplest way to start is the JMH Maven archetype, which generates a complete project with the right shade-plugin configuration to produce a fat JAR. Alternatively add the dependency to an existing project.
# Generate a new JMH project from archetype
mvn archetype:generate \
-DinteractiveMode=false \
-DarchetypeGroupId=org.openjdk.jmh \
-DarchetypeArtifactId=jmh-java-benchmark-archetype \
-DarchetypeVersion=1.37 \
-DgroupId=io.cscode \
-DartifactId=my-benchmarks \
-Dversion=1.0-SNAPSHOT
<!-- Add to an existing pom.xml -->
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-core</artifactId>
<version>1.37</version>
</dependency>
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-generator-annprocess</artifactId>
<version>1.37</version>
<scope>provided</scope>
</dependency>
# Run all benchmarks after building the fat JAR
mvn clean package -q
java -jar target/benchmarks.jar
# Run a specific benchmark class
java -jar target/benchmarks.jar StringBenchmark
# Quick mode — 1 fork, fewer iterations (development only)
java -jar target/benchmarks.jar -f 1 -wi 2 -i 3
A JMH benchmark is a regular Java class with methods annotated @Benchmark. JMH generates boilerplate code at compile time (via annotation processing) that handles warm-up loops, measurement loops, and result collection. The class must be public and the benchmark method must be public and non-static.
package io.cscode;
import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.options.*;
import java.util.concurrent.TimeUnit;
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 3, time = 1)
@Measurement(iterations = 5, time = 1)
@Fork(2)
public class StringBenchmark {
@Benchmark
public String concatenation() {
// Measure time of String + operator
return "Hello, " + "World" + "!";
}
@Benchmark
public String builderAppend() {
return new StringBuilder()
.append("Hello, ")
.append("World")
.append("!")
.toString();
}
public static void main(String[] args) throws Exception {
Options opt = new OptionsBuilder()
.include(StringBenchmark.class.getSimpleName())
.forks(1)
.build();
new Runner(opt).run();
}
}
Always annotate the class with @Fork, @Warmup, and @Measurement — or provide them via CLI. Without these, JMH uses defaults that may not be appropriate for your use case.
JMH supports four primary measurement modes. Choosing the right one determines what the numbers mean — always match the mode to the question you are answering.
| Mode | Annotation | Output | Use when you want to know… |
| Throughput | Mode.Throughput | ops/second | How many operations per second can this code sustain? |
| AverageTime | Mode.AverageTime | time/op | What is the average cost of one operation? |
| SampleTime | Mode.SampleTime | histogram + percentiles | What is the latency distribution (p50, p95, p99)? |
| SingleShotTime | Mode.SingleShotTime | one-shot time | Cold-start / first-invocation cost (no warm-up) |
| All | Mode.All | all of the above | Exploratory comparison — expensive but complete |
// Measure p99 latency distribution with SampleTime
@BenchmarkMode(Mode.SampleTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
public class LatencyBenchmark {
@Benchmark
public byte[] hashPassword() {
return MessageDigest.getInstance("SHA-256")
.digest("secret".getBytes(StandardCharsets.UTF_8));
}
}
@State objects hold benchmark inputs and shared resources. JMH creates and manages state instances according to the Scope you choose. Without @State, every benchmark method must create its own data — which benchmarks the allocation, not the logic you care about.
| Scope | Lifetime | Use for |
| Scope.Benchmark | Shared across all threads in all forks | Immutable inputs, large datasets loaded once |
| Scope.Thread | One instance per worker thread | ThreadLocal-like state, mutable per-thread counters |
| Scope.Group | Shared within a thread group | Producer-consumer benchmarks |
@State(Scope.Benchmark)
public class MapState {
public Map<String, Integer> map;
public List<String> keys;
@Setup(Level.Trial)
public void prepare() {
map = new HashMap<>();
keys = new ArrayList<>();
for (int i = 0; i < 100_000; i++) {
String key = "key-" + i;
map.put(key, i);
keys.add(key);
}
}
}
@BenchmarkMode(Mode.Throughput)
public class MapLookupBenchmark {
@Benchmark
public Integer hashMapGet(MapState state, Blackhole bh) {
// Benchmark lookup, not map creation
String key = state.keys.get(ThreadLocalRandom.current().nextInt(state.keys.size()));
return state.map.get(key);
}
}
@Setup and @TearDown methods run outside the measured region and control the lifecycle of @State objects. The Level parameter controls how often they run.
| Level | @Setup runs… | @TearDown runs… |
| Level.Trial | Once before the entire benchmark run (all forks combined) | Once after all forks |
| Level.Iteration | Before each measurement/warmup iteration | After each iteration |
| Level.Invocation | Before each single method call | After each single call (⚠️ expensive) |
@State(Scope.Thread)
public class ConnectionState {
Connection conn;
@Setup(Level.Trial)
public void openConnection() throws Exception {
conn = DriverManager.getConnection("jdbc:h2:mem:bench");
}
@TearDown(Level.Trial)
public void closeConnection() throws Exception {
if (conn != null) conn.close();
}
}
Level.Invocation runs setup/teardown on every single benchmark call. If your method takes nanoseconds, the setup overhead will dominate the measurement. Use Level.Invocation only when per-call reset is essential (e.g., benchmarking sort on a freshly unsorted array).
@Param lets you run the same benchmark across multiple input values, producing a result matrix. JMH automatically cross-products all @Param fields and runs a separate benchmark for each combination. This is far cleaner than copy-pasting benchmark methods.
@State(Scope.Benchmark)
public class SortBenchmark {
@Param({"100", "1000", "10000", "100000"})
public int size;
@Param({"random", "sorted", "reverse"})
public String order;
int[] data;
@Setup(Level.Invocation)
public void prepare() {
data = new int[size];
switch (order) {
case "random" -> { for (int i = 0; i < size; i++) data[i] = ThreadLocalRandom.current().nextInt(); }
case "sorted" -> { for (int i = 0; i < size; i++) data[i] = i; }
case "reverse" -> { for (int i = 0; i < size; i++) data[i] = size - i; }
}
}
@Benchmark
public void arraySort() {
Arrays.sort(data);
}
}
// Produces 4 × 3 = 12 benchmark rows in the results table
The JIT compiler is very good at eliminating code whose result is never used. If your benchmark computes a value and discards it, the JIT may simply remove the computation entirely, leaving you measuring nothing. A Blackhole is an opaque consumer that prevents this optimisation without introducing significant overhead itself.
@Benchmark
// BAD — JIT may eliminate the entire sin() call because result is unused
public void badSin() {
Math.sin(1.5);
}
@Benchmark
// GOOD — return value forces JIT to keep the computation
public double returnSin() {
return Math.sin(1.5);
}
@Benchmark
// GOOD — explicit Blackhole when you compute multiple intermediate values
public void multipleValues(Blackhole bh) {
double a = Math.sin(1.5);
double b = Math.cos(1.5);
bh.consume(a);
bh.consume(b);
}
Returning a value from a @Benchmark method is equivalent to passing it to a Blackhole. Use Blackhole explicitly only when you compute multiple values and need to consume each one.
These three annotations control how long JMH runs your benchmark and how many separate JVM processes it uses. Running in multiple forks (separate JVMs) eliminates cross-benchmark JIT pollution — profiling from one benchmark can affect JIT decisions for another if they share a JVM.
- @Fork(N) — run the benchmark in N separate JVM processes. Results are averaged across forks. Use @Fork(1) only for quick development checks; production benchmarks need @Fork(3+).
- @Warmup(iterations=N, time=T) — run N warm-up iterations of T seconds each before measuring. Warm-up allows the JIT to reach steady state. 3–5 iterations of 1–2 seconds is usually enough.
- @Measurement(iterations=N, time=T) — actual measurement iterations. More iterations reduce variance. 5 iterations of 1–2 seconds is a good default.
@Fork(value = 3, jvmArgs = {"-Xms512m", "-Xmx512m", "-XX:+UseG1GC"})
@Warmup(iterations = 5, time = 2, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 10, time = 2, timeUnit = TimeUnit.SECONDS)
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public class ProductionBenchmark {
@Benchmark
public int sumStream(Blackhole bh) {
return IntStream.rangeClosed(1, 1_000).sum();
}
@Benchmark
public int sumLoop() {
int sum = 0;
for (int i = 1; i <= 1_000; i++) sum += i;
return sum;
}
}
Most unreliable benchmarks fall into one of these traps. Recognising them saves hours of debugging confusing results.
| Pitfall | Symptom | Fix |
| Dead-code elimination | Benchmark runs in 0 ns; JIT removed the call | Return the value or use Blackhole.consume() |
| Constant folding | Result is always the same — JIT computes it at compile time | Feed input from @State fields instead of literals |
| Insufficient warm-up | High variance between early and late iterations | Increase @Warmup iterations/time |
| Single fork | Results vary wildly across runs | Use @Fork(3) minimum for publishable numbers |
| Benchmarking allocation | You think you benchmark logic but you measure new | Create inputs in @Setup, not inside @Benchmark |
| Shared mutable state | Non-repeatable results depending on execution order | Use correct @State(Scope); reset in @Setup(Level.Invocation) |
| Running in IDE | JVM flags differ; results don't match fat-JAR run | Always benchmark with java -jar benchmarks.jar |
| System load interference | Wildly different results at different times of day | Dedicate a machine or CI node; disable Turbo Boost on laptops |