How do I create a Stream of regex matches?

Well, in Java 8, there is Pattern.splitAsStream which will provide a stream of items split by a delimiter pattern but unfortunately no support method for getting a stream of matches.

If you are going to implement such a Stream, I recommend implementing Spliterator directly rather than implementing and wrapping an Iterator. You may be more familiar with Iterator but implementing a simple Spliterator is straight-forward:

final class MatchItr extends Spliterators.AbstractSpliterator<String> {
    private final Matcher matcher;
    MatchItr(Matcher m) {
        super(m.regionEnd()-m.regionStart(), ORDERED|NONNULL);
        matcher=m;
    }
    public boolean tryAdvance(Consumer<? super String> action) {
        if(!matcher.find()) return false;
        action.accept(matcher.group());
        return true;
    }
}

You may consider overriding forEachRemaining with a straight-forward loop, though.


If I understand your attempt correctly, the solution should look more like:

Pattern pattern = Pattern.compile(
                 "[a-zA-Z0-9.!#$%&’*+/=?^_`{|}~-]+@[a-zA-Z0-9-]+(?:\\.[a-zA-Z0-9-]+)");

try(BufferedReader br=new BufferedReader(System.console().reader())) {

    br.lines()
      .flatMap(line -> StreamSupport.stream(new MatchItr(pattern.matcher(line)), false))
      .collect(Collectors.groupingBy(o->o, TreeMap::new, Collectors.counting()))
      .forEach((k, v) -> System.out.printf("%s\t%s\n",k,v));
}

Java 9 provides a method Stream<MatchResult> results() directly on the Matcher. But for finding matches within a stream, there’s an even more convenient method on Scanner. With that, the implementation simplifies to

try(Scanner s = new Scanner(System.console().reader())) {
    s.findAll(pattern)
     .collect(Collectors.groupingBy(MatchResult::group,TreeMap::new,Collectors.counting()))
     .forEach((k, v) -> System.out.printf("%s\t%s\n",k,v));
}

This answer contains a back-port of Scanner.findAll that can be used with Java 8.

Leave a Comment