Contents

volatile solves the visibility problem — a write to a volatile field is immediately visible to all threads. But it does nothing for atomicity. An operation like counter++ is not a single instruction; it is three separate steps — read the current value, add one, write it back. Two threads can both read the same value, both compute the same incremented result, and both write it back, silently losing one update. The java.util.concurrent.atomic package solves this by mapping operations directly onto hardware Compare-And-Set (CAS) instructions. CAS atomically reads a memory location, compares it to an expected value, and writes a new value only if the comparison succeeds — all as one uninterruptible CPU operation. If another thread changed the value first, CAS fails and the caller retries with the fresh value, making the overall operation correct without ever taking a lock.

// volatile guarantees that every thread sees the latest value, // but it does NOT make compound operations atomic. class VolatileBroken { private volatile int counter = 0; // counter++ is THREE operations: READ, INCREMENT, WRITE // Two threads can both read 5, both increment to 6, both write 6 — lost update! void increment() { counter++; // NOT atomic even with volatile } } // synchronized fixes it but adds lock overhead and can block threads: class SynchronizedCounter { private int counter = 0; synchronized void increment() { counter++; } synchronized int get() { return counter; } } // AtomicInteger fixes it without locks — uses hardware CAS instruction: class AtomicCounter { private final AtomicInteger counter = new AtomicInteger(0); void increment() { counter.incrementAndGet(); } // atomic, no lock int get() { return counter.get(); } // always consistent } // CAS (Compare-And-Set) hardware instruction: // "If the current value equals expectedValue, set it to newValue atomically." // If another thread changed the value first, CAS fails — retry with new value. // This is how all Atomic* classes work under the hood. // Performance rule of thumb: // Low contention → AtomicInteger ≈ synchronized (both fast) // High contention → AtomicInteger >> synchronized (no blocking) // Very high contention → LongAdder >> AtomicInteger (see LongAdder section) CAS is a single CPU instruction on x86 (CMPXCHG) and ARM (LDXR/STXR). Because it is hardware-atomic, no OS-level lock or context switch is needed — making atomic classes substantially faster than synchronized under contention.

AtomicInteger wraps an int and exposes a rich set of atomic operations. The naming convention is consistent: getAndX() returns the old value then applies X, while xAndGet() applies X then returns the new value — mirroring the distinction between post-increment (i++) and pre-increment (++i). The Java 8 additions getAndUpdate() and updateAndGet() accept a UnaryOperator, letting you express arbitrary atomic transformations without a manual CAS loop. At the foundation of all these methods is compareAndSet(expected, update), which is the primitive CAS operation — it returns true and updates the value only when the current value equals expected. Every lock-free algorithm ultimately builds on this primitive.

import java.util.concurrent.atomic.AtomicInteger; AtomicInteger ai = new AtomicInteger(0); // --- Basic get/set --- int val = ai.get(); // read current value ai.set(10); // unconditional write (still volatile-visible) ai.lazySet(10); // relaxed write — eventually visible (faster, rare use) // --- Increment / Decrement --- int prev = ai.getAndIncrement(); // returns OLD value, then increments (like i++) int next = ai.incrementAndGet(); // increments, returns NEW value (like ++i) int prev2 = ai.getAndDecrement(); // returns OLD, then decrements int next2 = ai.decrementAndGet(); // decrements, returns NEW // --- Add --- int prev3 = ai.getAndAdd(5); // returns OLD, then adds 5 int next3 = ai.addAndGet(5); // adds 5, returns NEW // --- Update with a function (Java 8+) --- // getAndUpdate — apply UnaryOperator, return old value int old = ai.getAndUpdate(x -> x * 2); // updateAndGet — apply UnaryOperator, return new value int newVal = ai.updateAndGet(x -> x * 2); // accumulateAndGet — combine current value with given value using BinaryOperator int result = ai.accumulateAndGet(3, Integer::max); // atomically: value = max(value, 3) // --- compareAndSet (CAS) --- boolean swapped = ai.compareAndSet(10, 20); // If current value == 10, set to 20 and return true; else return false // Manual CAS loop (pattern used internally by all getAndUpdate methods): int prev4, next4; do { prev4 = ai.get(); next4 = prev4 + 5; // compute new value } while (!ai.compareAndSet(prev4, next4)); // retry if another thread changed it // Real-world example: thread-safe request counter class RequestTracker { private final AtomicInteger total = new AtomicInteger(0); private final AtomicInteger errors = new AtomicInteger(0); void recordSuccess() { total.incrementAndGet(); } void recordError() { total.incrementAndGet(); errors.incrementAndGet(); } int getTotal() { return total.get(); } int getErrors() { return errors.get(); } double errorRate() { int t = total.get(); return t == 0 ? 0.0 : (double) errors.get() / t; } }

AtomicLong has an identical API to AtomicInteger but operates on long values, making it the right choice for counters and sequence generators where 32-bit overflow is a concern — distributed ID generators, byte-transfer statistics, and monotonic timestamps all need 64-bit precision. AtomicBoolean is not simply a thread-safe boolean wrapper; its value comes from compareAndSet(false, true), which atomically performs a "test-and-set" — the call succeeds for exactly one thread even when many invoke it simultaneously. This makes it ideal for one-shot initialization guards, shutdown flags, and any scenario where a state transition must happen exactly once regardless of how many threads race to trigger it.

import java.util.concurrent.atomic.AtomicLong; import java.util.concurrent.atomic.AtomicBoolean; // --- AtomicLong — identical API to AtomicInteger, but for long --- AtomicLong sequence = new AtomicLong(0); // Distributed ID generator — safe across threads long nextId() { return sequence.incrementAndGet(); } // Timestamp tracking — keep the maximum seen timestamp AtomicLong maxTimestamp = new AtomicLong(0); void observeTimestamp(long ts) { // atomically update to max without a lock: maxTimestamp.accumulateAndGet(ts, Math::max); } // Byte counter for I/O statistics class IoStats { private final AtomicLong bytesRead = new AtomicLong(0); private final AtomicLong bytesWritten = new AtomicLong(0); void addRead(long n) { bytesRead.addAndGet(n); } void addWritten(long n) { bytesWritten.addAndGet(n); } long totalRead() { return bytesRead.get(); } long totalWritten() { return bytesWritten.get(); } } // --- AtomicBoolean — one-shot flag, init guard --- AtomicBoolean started = new AtomicBoolean(false); // Only the FIRST thread to call this will run initSystem(); rest are skipped. void initOnce() { if (started.compareAndSet(false, true)) { // CAS: only succeeds once initSystem(); } } // Shutdown hook — ensure shutdown runs exactly once even if multiple threads call it class Service { private final AtomicBoolean stopped = new AtomicBoolean(false); void stop() { if (stopped.compareAndSet(false, true)) { releaseResources(); // called exactly once } } } // AtomicBoolean is NOT just a boolean wrapper — its compareAndSet is the key: // compareAndSet(false, true) is an atomic "test and set" — used for one-shot guards

AtomicReference<V> brings the same CAS semantics to object references. Its compareAndSet compares by reference identity (==), not by equals(), which is important to keep in mind. A common pattern is storing an immutable snapshot object and swapping the entire reference atomically — this gives readers a consistent view with no locking, and writers simply produce a new snapshot and CAS it in. AtomicReference is also the building block for lock-free data structures such as linked lists and stacks, where a push or pop is implemented as a CAS on the head pointer. One hazard to know: the ABA problem, where a value changes from A to B and back to A between a read and a CAS, causing the CAS to succeed incorrectly. AtomicStampedReference addresses this by pairing the reference with an integer version stamp.

import java.util.concurrent.atomic.AtomicReference; import java.util.concurrent.atomic.AtomicStampedReference; // AtomicReference<V> — same CAS semantics for object references AtomicReference<String> current = new AtomicReference<>("initial"); current.set("updated"); String old = current.getAndSet("final"); // returns "updated", sets "final" // CAS on a reference — compare by identity (==), not equals() boolean ok = current.compareAndSet("final", "reset"); // true if current == "final" // Immutable config hot-reload pattern: record AppConfig(String host, int port, int timeout) {} class ConfigHolder { private final AtomicReference<AppConfig> config = new AtomicReference<>(new AppConfig("localhost", 8080, 30)); AppConfig get() { return config.get(); } // Safe reload — readers always see a consistent config snapshot void reload(AppConfig newConfig) { config.set(newConfig); // atomic reference swap — no partial state visible } } // ABA problem — CAS sees value A, misses A→B→A change, sets incorrectly // Example: stack push/pop with AtomicReference<Node> can corrupt if Node is reused. // Fix: AtomicStampedReference — attach a version stamp to detect ABA AtomicStampedReference<String> stamped = new AtomicStampedReference<>("A", 0); int[] stampHolder = new int[1]; String val = stamped.get(stampHolder); // reads value AND current stamp int stamp = stampHolder[0]; // CAS only succeeds if BOTH value and stamp match: boolean swapped = stamped.compareAndSet("A", "B", stamp, stamp + 1); // Lock-free stack using AtomicReference: class LockFreeStack<T> { private static class Node<T> { final T item; Node<T> next; Node(T item, Node<T> next) { this.item = item; this.next = next; } } private final AtomicReference<Node<T>> top = new AtomicReference<>(); void push(T item) { Node<T> newHead; Node<T> oldHead; do { oldHead = top.get(); newHead = new Node<>(item, oldHead); } while (!top.compareAndSet(oldHead, newHead)); // retry if another push raced } T pop() { Node<T> oldHead; Node<T> newHead; do { oldHead = top.get(); if (oldHead == null) return null; // empty newHead = oldHead.next; } while (!top.compareAndSet(oldHead, newHead)); return oldHead.item; } } The ABA problem occurs when a CAS reads value A, another thread changes it to B and back to A, and the CAS incorrectly succeeds. Use AtomicStampedReference or AtomicMarkableReference when nodes or objects can be recycled/reused.

AtomicIntegerArray provides atomic operations on individual elements of an integer array without requiring a lock on the entire array. This is key for fine-grained concurrent access: multiple threads can safely update different indices simultaneously, with full CAS semantics per element. The array is copied at construction time, so changes to the original source array have no effect. AtomicLongArray and AtomicReferenceArray offer the same capabilities for long values and object references respectively. The typical use cases are per-bucket counters, histograms, and sharded data structures where each slot is independently hot and locking the whole collection would be a bottleneck.

import java.util.concurrent.atomic.AtomicIntegerArray; import java.util.concurrent.atomic.AtomicLongArray; import java.util.concurrent.atomic.AtomicReferenceArray; // AtomicIntegerArray — array where each element can be updated atomically int[] initialValues = {1, 2, 3, 4, 5}; AtomicIntegerArray arr = new AtomicIntegerArray(initialValues); // Note: a copy is made — modifying initialValues after this has no effect // Element-level atomic operations: int v = arr.get(0); // read element 0 arr.set(0, 10); // write element 0 int prev = arr.getAndIncrement(2); // atomically increment element 2 arr.addAndGet(3, 5); // atomically add 5 to element 3 boolean ok = arr.compareAndSet(4, 5, 99); // CAS element 4: if 5, set to 99 int len = arr.length(); // array length // Use case: per-bucket counters (sharded counter / histogram) class Histogram { private static final int BUCKETS = 100; private final AtomicIntegerArray buckets = new AtomicIntegerArray(BUCKETS); void record(int value) { int bucket = Math.min(value / 10, BUCKETS - 1); buckets.incrementAndGet(bucket); } int count(int bucket) { return buckets.get(bucket); } void print() { for (int i = 0; i < BUCKETS; i++) { System.out.printf("Bucket %3d: %d%n", i * 10, buckets.get(i)); } } } // AtomicLongArray and AtomicReferenceArray have the same API for longs and objects: AtomicLongArray timestamps = new AtomicLongArray(10); AtomicReferenceArray<String> names = new AtomicReferenceArray<>(10); timestamps.set(0, System.currentTimeMillis()); names.compareAndSet(0, null, "Alice"); // set element 0 from null to "Alice"

Under heavy contention, many threads hammering a single AtomicLong repeatedly fail their CAS retries, burning CPU cycles without making progress. LongAdder eliminates this hot-spot by maintaining a distributed cell array: each thread preferentially updates its own cell, and contention is spread across cells rather than concentrated on one memory location. Calling sum() adds the base value and all cells together to produce the total. The trade-off is that sum() is not atomic with respect to concurrent increments — a read mid-update may observe a value slightly behind the true count. This is perfectly acceptable for metrics, hit counters, and rate accumulators where the read frequency is much lower than the write frequency and exact intermediate precision is not required. When you need CAS semantics or exact snapshots, use AtomicLong instead.

import java.util.concurrent.atomic.LongAdder; import java.util.concurrent.atomic.DoubleAdder; import java.util.concurrent.atomic.LongAccumulator; // Problem with AtomicLong under heavy contention: // All threads compete on ONE memory location — CAS failures cause retries → CPU waste // LongAdder solution: each thread updates its OWN cell; sum() adds them up // Under low contention: behaves like AtomicLong (one base cell) // Under high contention: expands to per-thread cells automatically LongAdder counter = new LongAdder(); // Producer threads: counter.increment(); // increment by 1 counter.add(5); // add arbitrary delta counter.decrement(); // subtract 1 // Consumer reads the sum: long total = counter.sum(); // adds all cells — NOT atomic with increments counter.reset(); // reset to 0 (not atomic — use carefully) long sumAndReset = counter.sumThenReset(); // sum then reset atomically-ish // Benchmark insight: // 16 threads doing 10M increments each: // synchronized int ~3.5 s // AtomicLong ~1.2 s // LongAdder ~0.3 s (fastest for pure counting) // LongAccumulator — generalized LongAdder with a custom binary operator LongAccumulator maxVal = new LongAccumulator(Long::max, Long.MIN_VALUE); maxVal.accumulate(42); maxVal.accumulate(17); maxVal.accumulate(99); System.out.println(maxVal.get()); // 99 — thread-safe running maximum // DoubleAdder — same idea for doubles (floating-point summation) DoubleAdder total2 = new DoubleAdder(); total2.add(1.5); total2.add(2.3); System.out.println(total2.sum()); // 3.8 // When NOT to use LongAdder: // - sum() is NOT linearizable with add() — reading mid-increment may be inaccurate // - If you need CAS semantics (compareAndSet) → use AtomicLong // - LongAdder uses more memory than AtomicLong (array of cells) LongAdder is the right choice for web-server hit counters, metrics collectors, and any scenario where you are writing far more often than reading the total. For sequences (IDs, version stamps) where exact intermediate values matter, stick with AtomicLong.

A concrete benchmark is the clearest way to understand when each counter type pays off. All three implementations — synchronized, AtomicInteger, and LongAdder — produce a correct final count; the difference is throughput under concurrent load. The synchronized counter serializes all increments through a monitor, so threads queue up and context-switch. AtomicInteger avoids the OS-level lock but still has threads competing on one cache line, causing CAS retries under high contention. LongAdder distributes the work and dominates in write-heavy scenarios — the broken non-atomic version (a plain long incremented without synchronization) is included as a reminder of what happens when you skip synchronization entirely: the final count is wrong.

import java.util.concurrent.atomic.AtomicInteger; import java.util.concurrent.atomic.LongAdder; import java.util.concurrent.*; // Full example: comparing synchronized, AtomicInteger, and LongAdder counters // under concurrent load from multiple threads. // 1. Synchronized (baseline) class SyncCounter { private long count = 0; synchronized void inc() { count++; } synchronized long get() { return count; } } // 2. AtomicInteger (lock-free) class AtomicCounter { private final AtomicInteger count = new AtomicInteger(0); void inc() { count.incrementAndGet(); } long get() { return count.get(); } } // 3. LongAdder (best for write-heavy) class AdderCounter { private final LongAdder count = new LongAdder(); void inc() { count.increment(); } long get() { return count.sum(); } } // Driver — 8 threads, 1 million increments each: public class LockFreeDemo { static final int THREADS = 8; static final int OPS_PER_THREAD = 1_000_000; static long benchmark(Runnable inc, java.util.function.LongSupplier get) throws InterruptedException { ExecutorService pool = Executors.newFixedThreadPool(THREADS); long start = System.nanoTime(); CountDownLatch latch = new CountDownLatch(THREADS); for (int i = 0; i < THREADS; i++) { pool.submit(() -> { for (int j = 0; j < OPS_PER_THREAD; j++) inc.run(); latch.countDown(); }); } latch.await(); long elapsed = System.nanoTime() - start; pool.shutdown(); long expected = (long) THREADS * OPS_PER_THREAD; assert get.getAsLong() == expected : "Counter is wrong!"; return elapsed / 1_000_000; // ms } public static void main(String[] args) throws Exception { SyncCounter sync = new SyncCounter(); AtomicCounter atomic = new AtomicCounter(); AdderCounter adder = new AdderCounter(); System.out.println("Synchronized: " + benchmark(sync::inc, sync::get) + " ms"); System.out.println("AtomicInteger: " + benchmark(atomic::inc, atomic::get) + " ms"); System.out.println("LongAdder: " + benchmark(adder::inc, adder::get) + " ms"); // Typical output on 8-core machine: // Synchronized: 620 ms // AtomicInteger: 210 ms // LongAdder: 55 ms } } All three implementations produce the correct final count — the difference is throughput. LongAdder is roughly 4–10x faster than AtomicInteger under high contention because threads rarely contend on the same cell.