Copy a stream to avoid “stream has already been operated upon or closed”

I think your assumption about efficiency is kind of backwards. You get this huge efficiency payback if you’re only going to use the data once, because you don’t have to store it, and streams give you powerful “loop fusion” optimizations that let you flow the whole data efficiently through the pipeline.

If you want to re-use the same data, then by definition you either have to generate it twice (deterministically) or store it. If it already happens to be in a collection, great; then iterating it twice is cheap.

We did experiment in the design with “forked streams”. What we found was that supporting this had real costs; it burdened the common case (use once) at the expense of the uncommon case. The big problem was dealing with “what happens when the two pipelines don’t consume data at the same rate.” Now you’re back to buffering anyway. This was a feature that clearly didn’t carry its weight.

If you want to operate on the same data repeatedly, either store it, or structure your operations as Consumers and do the following:

stream()...stuff....forEach(e -> { consumerA(e); consumerB(e); });

You might also look into the RxJava library, as its processing model lends itself better to this kind of “stream forking”.

Leave a Comment