Advanced Java Stream Collectors — groupingBy, teeing, and Custom Collectors

groupingBy and partitioningBy
Downstream Collectors
teeing — Two Collectors in One Pass
flatMapping and filtering Collectors
Custom Collectors with Collector.of()

groupingBy() is a terminal collector that groups stream elements into a Map where each key is produced by a classifier function applied to each element, and each value is the list (or other downstream collection) of matching elements. partitioningBy() is a special case that splits elements into exactly two groups under the keys true and false using a predicate; both collectors accept an optional downstream collector as a second argument for further reduction of each group.

import java.util.stream.*; import java.util.*; record Order(String customer, String category, double amount) {} List<Order> orders = List.of( new Order("Alice", "Books", 29.99), new Order("Bob", "Books", 14.99), new Order("Alice", "Electronics", 299.99), new Order("Bob", "Electronics", 499.99), new Order("Alice", "Books", 9.99) ); // groupingBy — group by a classifier function Map<String, List<Order>> byCustomer = orders.stream().collect(Collectors.groupingBy(Order::customer)); // {Alice=[...], Bob=[...]} // groupingBy — group by multiple fields (cascade) Map<String, Map<String, List<Order>>> byCustomerAndCategory = orders.stream().collect( Collectors.groupingBy(Order::customer, Collectors.groupingBy(Order::category))); // partitioningBy — splits into exactly two groups: true and false Map<Boolean, List<Order>> highLow = orders.stream().collect( Collectors.partitioningBy(o -> o.amount() > 100)); List<Order> highValue = highLow.get(true); // amount > 100 List<Order> lowValue = highLow.get(false); // Counting in groups Map<String, Long> countByCategory = orders.stream().collect( Collectors.groupingBy(Order::category, Collectors.counting())); // {Books=3, Electronics=2}

Downstream collectors transform the grouped collections. They are the second argument to groupingBy and partitioningBy:

// Sum amounts per customer Map<String, Double> totalByCustomer = orders.stream().collect( Collectors.groupingBy(Order::customer, Collectors.summingDouble(Order::amount))); // {Alice=339.97, Bob=514.98} // Average per category Map<String, Double> avgByCategory = orders.stream().collect( Collectors.groupingBy(Order::category, Collectors.averagingDouble(Order::amount))); // Collect to a specific collection type Map<String, TreeSet<Double>> sortedAmounts = orders.stream().collect( Collectors.groupingBy(Order::category, Collectors.mapping(Order::amount, Collectors.toCollection(TreeSet::new)))); // summarizingDouble — count, sum, min, max, avg in one pass Map<String, DoubleSummaryStatistics> stats = orders.stream().collect( Collectors.groupingBy(Order::category, Collectors.summarizingDouble(Order::amount))); stats.forEach((cat, s) -> System.out.printf("%s: count=%d sum=%.2f avg=%.2f%n", cat, s.getCount(), s.getSum(), s.getAverage())); // joining — concatenate strings Map<String, String> customerOrders = orders.stream().collect( Collectors.groupingBy(Order::customer, Collectors.mapping(Order::category, Collectors.joining(", ", "[", "]")))); // {Alice=[Books, Electronics, Books], Bob=[Books, Electronics]} // maxBy / minBy Map<String, Optional<Order>> biggestByCustomer = orders.stream().collect( Collectors.groupingBy(Order::customer, Collectors.maxBy(Comparator.comparingDouble(Order::amount))));

Collectors.teeing(), added in Java 12, applies two collectors to the same stream simultaneously and merges their results with a BiFunction:

import java.util.stream.Collectors; // Find min and max in a single pass record MinMax(int min, int max) {} MinMax minMax = IntStream.rangeClosed(1, 10).boxed().collect( Collectors.teeing( Collectors.minBy(Comparator.naturalOrder()), Collectors.maxBy(Comparator.naturalOrder()), (min, max) -> new MinMax(min.orElseThrow(), max.orElseThrow()) )); System.out.println(minMax); // MinMax[min=1, max=10] // Count and sum simultaneously record CountSum(long count, double sum) {} CountSum cs = orders.stream().collect( Collectors.teeing( Collectors.counting(), Collectors.summingDouble(Order::amount), CountSum::new )); System.out.printf("count=%d, sum=%.2f%n", cs.count(), cs.sum()); // Partition and count each partition record Split(List<Order> high, List<Order> low) {} Split split = orders.stream().collect( Collectors.teeing( Collectors.filtering(o -> o.amount() > 100, Collectors.toList()), Collectors.filtering(o -> o.amount() <= 100, Collectors.toList()), Split::new )); // Average and standard deviation in one pass double[] avgAndStdDev = orders.stream() .mapToDouble(Order::amount) .collect( () -> new double[3], // [sum, sumSq, count] (arr, v) -> { arr[0]+=v; arr[1]+=v*v; arr[2]++; }, (a, b) -> { a[0]+=b[0]; a[1]+=b[1]; a[2]+=b[2]; } ); // Manual aggregation alternative to teeing for primitive streams

Collectors.flatMapping() and Collectors.filtering(), added in Java 9, are especially useful as downstream collectors:

record BlogPost(String author, List<String> tags) {} List<BlogPost> posts = List.of( new BlogPost("Alice", List.of("java", "streams", "functional")), new BlogPost("Bob", List.of("java", "concurrency")), new BlogPost("Alice", List.of("kotlin", "java")) ); // flatMapping — flatten lists and collect per group Map<String, List<String>> tagsByAuthor = posts.stream().collect( Collectors.groupingBy(BlogPost::author, Collectors.flatMapping(p -> p.tags().stream(), Collectors.toList()))); // {Alice=[java, streams, functional, kotlin, java], Bob=[java, concurrency]} // flatMapping with distinct Map<String, Set<String>> uniqueTagsByAuthor = posts.stream().collect( Collectors.groupingBy(BlogPost::author, Collectors.flatMapping(p -> p.tags().stream(), Collectors.toUnmodifiableSet()))); // filtering — filter elements within each group Map<String, List<Order>> largeOrdersByCustomer = orders.stream().collect( Collectors.groupingBy(Order::customer, Collectors.filtering(o -> o.amount() > 100, Collectors.toList()))); // Includes keys even when no elements pass the filter (unlike stream().filter()) // {Alice=[Order(Alice, Electronics, 299.99)], Bob=[Order(Bob, Electronics, 499.99)]} // Compare: filtering collector keeps empty groups, stream filter does not Map<String, Long> countHighOrders = orders.stream().collect( Collectors.groupingBy(Order::customer, Collectors.filtering(o -> o.amount() > 1000, Collectors.counting()))); // {Alice=0, Bob=0} — both customers have zero high-value orders, but both keys present

Build fully custom collectors using Collector.of() by providing supplier, accumulator, combiner, and finisher functions:

// Custom collector: collect to an immutable LinkedList (in order) Collector<String, ArrayDeque<String>, List<String>> toImmutableList = Collector.of( ArrayDeque::new, // supplier: create mutable container ArrayDeque::addLast, // accumulator: add element (a, b) -> { a.addAll(b); return a; }, // combiner: merge two containers (parallel) List::copyOf // finisher: convert to immutable list ); List<String> result = Stream.of("a", "b", "c").collect(toImmutableList); System.out.println(result); // [a, b, c] // Custom collector: build a frequency map Collector<String, Map<String, Integer>, Map<String, Integer>> freqMap = Collector.of( HashMap::new, (map, s) -> map.merge(s, 1, Integer::sum), (a, b) -> { b.forEach((k, v) -> a.merge(k, v, Integer::sum)); return a; }, Collections::unmodifiableMap ); Map<String, Integer> freq = Stream.of("a","b","a","c","b","a").collect(freqMap); System.out.println(freq); // {a=3, b=2, c=1} // Custom collector: running statistics (mean, variance online) Collector<Double, double[], double[]> stats = Collector.of( () -> new double[3], // [count, mean, M2] — Welford's online algorithm (arr, x) -> { arr[0]++; double delta = x - arr[1]; arr[1] += delta / arr[0]; arr[2] += delta * (x - arr[1]); }, (a, b) -> { /* combine — complex for Welford's, omitted */ return a; }, arr -> arr // finisher — caller reads arr[0]=n, arr[1]=mean, arr[2]=M2 );

Use Collector.Characteristics.UNORDERED and IDENTITY_FINISH flags in Collector.of() for better parallel performance when your collector doesn't care about element order and the mutable container is the final result.

Contents