Contents
- groupingBy and partitioningBy
- Downstream Collectors
- teeing — Two Collectors in One Pass
- flatMapping and filtering Collectors
- Custom Collectors with Collector.of()
groupingBy() is a terminal collector that groups stream elements into a Map where each key is produced by a classifier function applied to each element, and each value is the list (or other downstream collection) of matching elements. partitioningBy() is a special case that splits elements into exactly two groups under the keys true and false using a predicate; both collectors accept an optional downstream collector as a second argument for further reduction of each group.
import java.util.stream.*;
import java.util.*;
record Order(String customer, String category, double amount) {}
List<Order> orders = List.of(
new Order("Alice", "Books", 29.99),
new Order("Bob", "Books", 14.99),
new Order("Alice", "Electronics", 299.99),
new Order("Bob", "Electronics", 499.99),
new Order("Alice", "Books", 9.99)
);
// groupingBy — group by a classifier function
Map<String, List<Order>> byCustomer =
orders.stream().collect(Collectors.groupingBy(Order::customer));
// {Alice=[...], Bob=[...]}
// groupingBy — group by multiple fields (cascade)
Map<String, Map<String, List<Order>>> byCustomerAndCategory =
orders.stream().collect(
Collectors.groupingBy(Order::customer,
Collectors.groupingBy(Order::category)));
// partitioningBy — splits into exactly two groups: true and false
Map<Boolean, List<Order>> highLow = orders.stream().collect(
Collectors.partitioningBy(o -> o.amount() > 100));
List<Order> highValue = highLow.get(true); // amount > 100
List<Order> lowValue = highLow.get(false);
// Counting in groups
Map<String, Long> countByCategory = orders.stream().collect(
Collectors.groupingBy(Order::category, Collectors.counting()));
// {Books=3, Electronics=2}
Downstream collectors transform the grouped collections. They are the second argument to groupingBy and partitioningBy:
// Sum amounts per customer
Map<String, Double> totalByCustomer = orders.stream().collect(
Collectors.groupingBy(Order::customer,
Collectors.summingDouble(Order::amount)));
// {Alice=339.97, Bob=514.98}
// Average per category
Map<String, Double> avgByCategory = orders.stream().collect(
Collectors.groupingBy(Order::category,
Collectors.averagingDouble(Order::amount)));
// Collect to a specific collection type
Map<String, TreeSet<Double>> sortedAmounts = orders.stream().collect(
Collectors.groupingBy(Order::category,
Collectors.mapping(Order::amount,
Collectors.toCollection(TreeSet::new))));
// summarizingDouble — count, sum, min, max, avg in one pass
Map<String, DoubleSummaryStatistics> stats = orders.stream().collect(
Collectors.groupingBy(Order::category,
Collectors.summarizingDouble(Order::amount)));
stats.forEach((cat, s) ->
System.out.printf("%s: count=%d sum=%.2f avg=%.2f%n",
cat, s.getCount(), s.getSum(), s.getAverage()));
// joining — concatenate strings
Map<String, String> customerOrders = orders.stream().collect(
Collectors.groupingBy(Order::customer,
Collectors.mapping(Order::category,
Collectors.joining(", ", "[", "]"))));
// {Alice=[Books, Electronics, Books], Bob=[Books, Electronics]}
// maxBy / minBy
Map<String, Optional<Order>> biggestByCustomer = orders.stream().collect(
Collectors.groupingBy(Order::customer,
Collectors.maxBy(Comparator.comparingDouble(Order::amount))));
Collectors.teeing(), added in Java 12, applies two collectors to the same stream simultaneously and merges their results with a BiFunction:
import java.util.stream.Collectors;
// Find min and max in a single pass
record MinMax(int min, int max) {}
MinMax minMax = IntStream.rangeClosed(1, 10).boxed().collect(
Collectors.teeing(
Collectors.minBy(Comparator.naturalOrder()),
Collectors.maxBy(Comparator.naturalOrder()),
(min, max) -> new MinMax(min.orElseThrow(), max.orElseThrow())
));
System.out.println(minMax); // MinMax[min=1, max=10]
// Count and sum simultaneously
record CountSum(long count, double sum) {}
CountSum cs = orders.stream().collect(
Collectors.teeing(
Collectors.counting(),
Collectors.summingDouble(Order::amount),
CountSum::new
));
System.out.printf("count=%d, sum=%.2f%n", cs.count(), cs.sum());
// Partition and count each partition
record Split(List<Order> high, List<Order> low) {}
Split split = orders.stream().collect(
Collectors.teeing(
Collectors.filtering(o -> o.amount() > 100, Collectors.toList()),
Collectors.filtering(o -> o.amount() <= 100, Collectors.toList()),
Split::new
));
// Average and standard deviation in one pass
double[] avgAndStdDev = orders.stream()
.mapToDouble(Order::amount)
.collect(
() -> new double[3], // [sum, sumSq, count]
(arr, v) -> { arr[0]+=v; arr[1]+=v*v; arr[2]++; },
(a, b) -> { a[0]+=b[0]; a[1]+=b[1]; a[2]+=b[2]; }
);
// Manual aggregation alternative to teeing for primitive streams
Collectors.flatMapping() and Collectors.filtering(), added in Java 9, are especially useful as downstream collectors:
record BlogPost(String author, List<String> tags) {}
List<BlogPost> posts = List.of(
new BlogPost("Alice", List.of("java", "streams", "functional")),
new BlogPost("Bob", List.of("java", "concurrency")),
new BlogPost("Alice", List.of("kotlin", "java"))
);
// flatMapping — flatten lists and collect per group
Map<String, List<String>> tagsByAuthor = posts.stream().collect(
Collectors.groupingBy(BlogPost::author,
Collectors.flatMapping(p -> p.tags().stream(),
Collectors.toList())));
// {Alice=[java, streams, functional, kotlin, java], Bob=[java, concurrency]}
// flatMapping with distinct
Map<String, Set<String>> uniqueTagsByAuthor = posts.stream().collect(
Collectors.groupingBy(BlogPost::author,
Collectors.flatMapping(p -> p.tags().stream(),
Collectors.toUnmodifiableSet())));
// filtering — filter elements within each group
Map<String, List<Order>> largeOrdersByCustomer = orders.stream().collect(
Collectors.groupingBy(Order::customer,
Collectors.filtering(o -> o.amount() > 100,
Collectors.toList())));
// Includes keys even when no elements pass the filter (unlike stream().filter())
// {Alice=[Order(Alice, Electronics, 299.99)], Bob=[Order(Bob, Electronics, 499.99)]}
// Compare: filtering collector keeps empty groups, stream filter does not
Map<String, Long> countHighOrders = orders.stream().collect(
Collectors.groupingBy(Order::customer,
Collectors.filtering(o -> o.amount() > 1000, Collectors.counting())));
// {Alice=0, Bob=0} — both customers have zero high-value orders, but both keys present
Build fully custom collectors using Collector.of() by providing supplier, accumulator, combiner, and finisher functions:
// Custom collector: collect to an immutable LinkedList (in order)
Collector<String, ArrayDeque<String>, List<String>> toImmutableList =
Collector.of(
ArrayDeque::new, // supplier: create mutable container
ArrayDeque::addLast, // accumulator: add element
(a, b) -> { a.addAll(b); return a; }, // combiner: merge two containers (parallel)
List::copyOf // finisher: convert to immutable list
);
List<String> result = Stream.of("a", "b", "c").collect(toImmutableList);
System.out.println(result); // [a, b, c]
// Custom collector: build a frequency map
Collector<String, Map<String, Integer>, Map<String, Integer>> freqMap =
Collector.of(
HashMap::new,
(map, s) -> map.merge(s, 1, Integer::sum),
(a, b) -> { b.forEach((k, v) -> a.merge(k, v, Integer::sum)); return a; },
Collections::unmodifiableMap
);
Map<String, Integer> freq = Stream.of("a","b","a","c","b","a").collect(freqMap);
System.out.println(freq); // {a=3, b=2, c=1}
// Custom collector: running statistics (mean, variance online)
Collector<Double, double[], double[]> stats = Collector.of(
() -> new double[3], // [count, mean, M2] — Welford's online algorithm
(arr, x) -> {
arr[0]++;
double delta = x - arr[1];
arr[1] += delta / arr[0];
arr[2] += delta * (x - arr[1]);
},
(a, b) -> { /* combine — complex for Welford's, omitted */ return a; },
arr -> arr // finisher — caller reads arr[0]=n, arr[1]=mean, arr[2]=M2
);
Use Collector.Characteristics.UNORDERED and IDENTITY_FINISH flags in Collector.of() for better parallel performance when your collector doesn't care about element order and the mutable container is the final result.