Java HashMap Internals — Hashing, Buckets, Load Factor & Treeification

Internal Structure — the Bucket Array
Hashing — from Key to Bucket Index
Collision Handling — Chaining
Java 8 Treeification
Load Factor & Resize
equals & hashCode Contract for Keys
ConcurrentModificationException
Initial Capacity & Sizing Tips

A HashMap is backed by an array of buckets (internally called Node[] table). Each bucket is either empty (null), a single node, a linked list of nodes (collision chain), or — since Java 8 — a red-black tree. Every node stores the key, the value, the key's hash, and a pointer to the next node in the same bucket.

// Conceptual structure (simplified from OpenJDK source) class HashMap<K, V> { // The bucket array — default initial capacity is 16 Node<K, V>[] table; int size; // number of key-value pairs int threshold; // resize when size > threshold (= capacity * loadFactor) float loadFactor; // default 0.75 static class Node<K, V> { final int hash; // cached hash of key final K key; V value; Node<K, V> next; // next node in the same bucket (collision chain) } // Java 8: buckets with >= 8 entries become TreeNode (red-black tree) static final class TreeNode<K, V> extends Node<K, V> { TreeNode<K, V> parent, left, right; boolean red; } }

The bucket array size is always a power of two. This allows the bucket index to be computed with a fast bitwise AND ((n-1) & hash) instead of a modulo operation, which is a major performance optimisation for large maps.

When you call put(key, value), HashMap computes the bucket index in two steps. First, it calls key.hashCode() and then applies a supplemental hash function that mixes the upper bits of the hash code into the lower bits. This spreads keys more evenly across buckets, especially when the original hashCode() has poor distribution in the low-order bits. Second, it applies a bitwise AND with (capacity - 1) to map the hash to a valid bucket index.

// Simplified from OpenJDK HashMap source static final int hash(Object key) { int h; // XOR upper 16 bits into lower 16 bits — mixes bits for better distribution return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); } // Bucket index: (n - 1) & hash where n = table.length (always power of two) // Example with capacity=16 (n=16, n-1=15 = 0b00001111) // hash = 0b10110101_11001010_01101001_00110101 // (n-1)= 0b00000000_00000000_00000000_00001111 // index= 0b00000000_00000000_00000000_00000101 = 5 // Demonstration: two keys with same bucket index (collision) HashMap<String, Integer> map = new HashMap<>(); map.put("Aa", 1); // "Aa".hashCode() = 2112 map.put("BB", 2); // "BB".hashCode() = 2112 — same hashCode! // Both map to the same bucket — handled by chaining (see next section) System.out.println(map.get("Aa")); // 1 System.out.println(map.get("BB")); // 2 — found via equals() in the chain

When two keys hash to the same bucket, they form a linked list (a chain) at that bucket. On a get(key), HashMap walks the chain calling key.equals(candidate) on each node until it finds a match or exhausts the list. This is why collisions degrade performance: O(1) average becomes O(k) where k is the chain length.

import java.util.*; // Deliberately bad hashCode — everything hashes to the same bucket public class AllSameBucket { private final int id; AllSameBucket(int id) { this.id = id; } @Override public int hashCode() { return 42; } // TERRIBLE — all in one bucket @Override public boolean equals(Object o) { return o instanceof AllSameBucket b && this.id == b.id; } } // Performance demonstration Map<AllSameBucket, String> badMap = new HashMap<>(); for (int i = 0; i < 10_000; i++) { badMap.put(new AllSameBucket(i), "v" + i); } // get() on badMap is O(n) — walks a chain of 10,000 nodes! // Lookups with good vs bad hashCode long start = System.nanoTime(); badMap.get(new AllSameBucket(9_999)); System.out.println("Bad: " + (System.nanoTime() - start) + " ns"); Map<Integer, String> goodMap = new HashMap<>(); for (int i = 0; i < 10_000; i++) goodMap.put(i, "v" + i); start = System.nanoTime(); goodMap.get(9_999); System.out.println("Good: " + (System.nanoTime() - start) + " ns"); // Good hashCode: nanoseconds; bad hashCode: microseconds (100-1000x slower)

Java 8 introduced a safeguard against pathological collision attacks and accidentally bad hash functions: when a bucket's chain reaches 8 entries and the table has at least 64 buckets, the chain is converted to a red-black tree. This caps worst-case lookup at O(log n) even with all keys in the same bucket, instead of O(n). When entries are removed and the tree shrinks back to 6 nodes, it converts back to a linked list.

// Treeification thresholds (from HashMap source) static final int TREEIFY_THRESHOLD = 8; // chain → tree when length >= 8 static final int UNTREEIFY_THRESHOLD = 6; // tree → chain when length <= 6 static final int MIN_TREEIFY_CAPACITY = 64; // only treeify if table.length >= 64 // With treeification, our AllSameBucket example from above: // - First 7 inserts: linked list O(n) traversal // - 8th insert + table capacity >= 64: bucket converts to red-black tree // - Subsequent inserts: O(log n) per operation instead of O(n) // // For 10,000 keys in one bucket: // Without treeification: get() visits up to 10,000 nodes // With treeification: get() visits at most log2(10,000) ≈ 14 nodes // Demonstration — treeified HashMap still returns correct results Map<AllSameBucket, String> treeMap = new HashMap<>(128); // capacity >= 64 for (int i = 0; i < 100; i++) { treeMap.put(new AllSameBucket(i), "value-" + i); } // After 8 entries, the bucket is a red-black tree System.out.println(treeMap.get(new AllSameBucket(99))); // value-99 (O(log 100))

Treeification requires keys to be Comparable for tie-breaking within the tree. If your key class does not implement Comparable, HashMap falls back to identity hash comparisons (System.identityHashCode), which still works but with slightly weaker ordering guarantees.

The load factor controls how full the bucket array is allowed to get before HashMap doubles its capacity. The default is 0.75 — meaning HashMap resizes when the number of entries exceeds 75% of the current capacity. Resizing allocates a new array of double the size and rehashes every entry — this is an O(n) operation but happens infrequently enough that amortised performance is still O(1).

// Default: capacity=16, loadFactor=0.75 → threshold=12 // HashMap resizes at 13th entry (12+1) HashMap<Integer, String> map = new HashMap<>(); // After 12 entries: capacity stays 16 // After 13th entry: resize to capacity 32, threshold 24 // After 25th entry: resize to capacity 64, threshold 48 // ... and so on, always doubling // Pre-sizing to avoid expensive resizes // Formula: initialCapacity = expectedEntries / loadFactor + 1 int expected = 1_000; float loadFactor = 0.75f; int initialCapacity = (int)(expected / loadFactor) + 1; // 1334 HashMap<String, Object> preScaled = new HashMap<>(initialCapacity); // No resizes will occur for up to 1000 entries // Guava's Maps.newHashMapWithExpectedSize() does the same calculation for you // HashMap<String, Object> map = Maps.newHashMapWithExpectedSize(1000); // Trade-offs: load factor vs memory vs performance // Lower load factor (e.g., 0.5): fewer collisions, faster lookups, more memory // Higher load factor (e.g., 0.9): more collisions, slower lookups, less memory // 0.75 is the empirically-tuned sweet spot between the two

Resizing is expensive — it walks the entire entry set and rehashes every key. If you know the approximate size of a map upfront, always pre-size it. For a map expected to hold N entries, pass (int)(N / 0.75) + 1 as the initial capacity to the constructor.

HashMap's correctness depends entirely on the equals()/hashCode() contract: if two objects are equal, they must have the same hash code. Violating this rule causes entries to disappear — you put a key in, then can't get it back.

// BROKEN key — hashCode changes after insertion public class MutableKey { int value; MutableKey(int v) { this.value = v; } @Override public int hashCode() { return value; } // changes with value! @Override public boolean equals(Object o) { return o instanceof MutableKey k && this.value == k.value; } } HashMap<MutableKey, String> map = new HashMap<>(); MutableKey key = new MutableKey(10); map.put(key, "hello"); key.value = 20; // mutate the key AFTER insertion System.out.println(map.get(key)); // null — bucket 10 has the entry, key hashes to 20 // CORRECT keys are immutable or have hash/equality based on final fields public final class StableKey { private final int id; public StableKey(int id) { this.id = id; } @Override public int hashCode() { return Integer.hashCode(id); } @Override public boolean equals(Object o) { return o instanceof StableKey k && this.id == k.id; } } // Common mistake: overriding equals but NOT hashCode public class BrokenEquals { int x; BrokenEquals(int x) { this.x = x; } @Override public boolean equals(Object o) { return o instanceof BrokenEquals b && this.x == b.x; } // hashCode NOT overridden — uses Object.hashCode() (identity hash) // Two "equal" objects will have different hash codes → different buckets // → HashMap.get() always returns null for equal keys created separately! } HashMap<BrokenEquals, String> brokenMap = new HashMap<>(); brokenMap.put(new BrokenEquals(5), "five"); System.out.println(brokenMap.get(new BrokenEquals(5))); // null — wrong bucket!

HashMap uses a modCount counter that increments on every structural change (put, remove, resize). Each iterator captures modCount at creation and checks it on every next() call. If the map is structurally modified while iterating — from any thread or even the same thread — the iterator immediately throws ConcurrentModificationException. This is a fail-fast mechanism to surface bugs early rather than silently producing wrong results.

import java.util.*; Map<String, Integer> map = new HashMap<>(); map.put("a", 1); map.put("b", 2); map.put("c", 3); // WRONG — modifying the map during iteration try { for (String key : map.keySet()) { if (key.equals("b")) map.remove(key); // ConcurrentModificationException! } } catch (ConcurrentModificationException e) { System.out.println("Caught: " + e.getClass().getSimpleName()); } // CORRECT 1: use Iterator.remove() Iterator<Map.Entry<String, Integer>> it = map.entrySet().iterator(); while (it.hasNext()) { if (it.next().getKey().equals("b")) it.remove(); // safe — uses the iterator } // CORRECT 2: collect keys to remove, then remove after iteration Set<String> toRemove = new HashSet<>(); for (String key : map.keySet()) { if (key.equals("a")) toRemove.add(key); } map.keySet().removeAll(toRemove); // CORRECT 3: removeIf (Java 8+) map.entrySet().removeIf(e -> e.getValue() < 2); // CORRECT 4: use ConcurrentHashMap if concurrent modification is needed Map<String, Integer> safeMap = new java.util.concurrent.ConcurrentHashMap<>(map); for (String key : safeMap.keySet()) { if (key.equals("c")) safeMap.remove(key); // no CME — ConcurrentHashMap is safe }

ConcurrentModificationException is a best-effort detection mechanism, not a hard guarantee. In a multithreaded scenario without synchronisation, the iterator might not always detect the modification — it may return stale data instead. For truly concurrent access, use ConcurrentHashMap.

Scenario	Recommendation
Unknown size, read-heavy after build	Default constructor; rebuild when size is known
Known approximate size N	new HashMap<>((int)(N / 0.75) + 1)
Small map, many misses	Lower load factor (e.g., 0.5f) to reduce collision chains
Memory-constrained, high hit rate	Higher load factor (e.g., 0.9f)
Concurrent read/write from multiple threads	Use ConcurrentHashMap instead
Insertion-order traversal needed	Use LinkedHashMap
Sorted key traversal needed	Use TreeMap

// Building a map whose final size is known (e.g., loading 500 config entries) int expectedSize = 500; Map<String, String> config = new HashMap<>((int)(expectedSize / 0.75f) + 1); // Equivalent with Guava // Map<String, String> config = Maps.newHashMapWithExpectedSize(expectedSize); // LinkedHashMap — same internals as HashMap + doubly-linked list for insertion order Map<String, Integer> ordered = new LinkedHashMap<>(); ordered.put("first", 1); ordered.put("second", 2); ordered.put("third", 3); System.out.println(ordered.keySet()); // [first, second, third] — insertion order preserved // Access-order LinkedHashMap — foundation for LRU caches Map<String, Integer> lru = new LinkedHashMap<>(16, 0.75f, true /* access-order */) { @Override protected boolean removeEldestEntry(Map.Entry<String, Integer> eldest) { return size() > 3; // evict oldest when more than 3 entries } }; lru.put("a", 1); lru.put("b", 2); lru.put("c", 3); lru.get("a"); // access "a" — moves it to most-recently-used position lru.put("d", 4); // triggers eviction of LRU entry System.out.println(lru.keySet()); // [c, a, d] ("b" was evicted)

Contents