JDK9 randomization on immutable sets and maps

Well it turns out there is another reason for the randomized iteration order. It’s not a big secret or anything. I thought I had explained it in that talk, but maybe not. I probably mentioned it on the OpenJDK mailing lists or perhaps in internal discussions.

In any case, another reason for randomized iteration order is to preserve flexibility for future implementation changes.

This turns out to be a bigger deal than most people think. Historically, HashSet and HashMap have never specified a particular iteration order. From time to time, however, the implementation needed to change, to improve performance or to fix bugs. Any change to iteration order generated a lot of flak from users. Over the years, a lot of resistance built up to changing iteration order, and this made maintenance of HashMap more difficult.

To see why this is a problem, consider a spectrum of different policies for managing the stability of iteration order:

  1. Specify the iteration order, and stick to it.

  2. Leave iteration order unspecified, but implicitly keep iteration order stable.

  3. Leave iteration order unspecified, but change the iteration order as little as possible.

  4. Change the iteration order frequently, e.g., in update releases.

  5. Change the iteration order more frequently, e.g., from one run of the JVM to the next.

  6. Change the iteration order even more frequently, e.g., from one iteration to the next.

When collections were introduced in JDK 1.2, HashMap iteration order was unspecified. Stable iteration order was provided by LinkedHashMap at a somewhat higher cost. If you didn’t need a stable iteration order, you shouldn’t have to pay for it. This ruled out #1 and #2.

For the next several releases, we tried to keep iteration order stable, even though the specification allowed it to change. Nobody likes it when code breaks, and it’s pretty unpleasant to have to tell a customer that his code is broken because it depends on iteration order.

So we ended up with policy #3, keeping iteration order as stable as possible, although it did change from time to time. For example, we introduced alternative hashing in JDK 7u6 (code review for JDK-7118743) and tree bins in JDK 8 (JEP 180), and both changed HashMap iteration order in some circumstances. Ordering also changed a couple times in earlier releases. Somebody did some archaeology and found that iteration order changed an average of once per major JDK release.

This was the worst of all possible worlds. Major releases only occurred once every couple years. When one came out, everybody’s code would break. There would be much wailing and gnashing of teeth, people would fix their code, and we’d promise to never, ever change iteration order again. A couple years would go by and new code would be written that inadvertently depended on iteration order. Then we’d come out with another major release that changed the iteration order, and this would break everybody’s code again. And the cycle would begin anew.

I wanted to avoid repeating this cycle for the new collections. Instead of keeping iteration order as stable as possible, I pursued a policy of changing it as frequently as possible. Initially the order changed on every iteration, but this imposed some overhead. Eventually we settled on once per JVM invocation. The cost is a 32-bit XOR operation per table probe, which I think is pretty cheap.

To a certain extent this is about “toughening up” application code. If changing iteration order breaks code, then breaking that code more frequently will cause it to develop resistance that kind of breakage. Of course, the code doesn’t get stronger by itself; it requires more developer effort for this to happen. And people will quite reasonably complain about having to do this additional work.

But the “toughening” of application code is in some sense secondary to the other goal of preserving the freedom to change the implementation. Preserving iteration order of HashMap has made it more difficult to maintain. Randomized iteration order in the new collections means that we needn’t worry about preserving iteration order when modifying them, so they’re easier to maintain and enhance.

For example, the current implementation (Java 9, pre-GA, July 2017) has three field-based implementations of Set (Set0, Set1, and Set2) and an array-based implementation (SetN) that uses a simple closed hashing with linear probing scheme. In the future, we might want to add a Set3 implementation that holds three elements in three fields. Or, we might want to change the collision resolution policy of SetN from linear probing to something more sophisticated. We can completely restructure the implementation, even in minor releases, if we don’t have to deal with preserving iteration order.

In summary, the tradeoff is that application developers have to do more work to make sure their code resists breakage from iteration order change. This is likely work they’d have to do at some point anyway with HashMap. What’s gained by this is more opportunities for the JDK to deliver improved performance and space efficiency, from which everybody can benefit.

Leave a Comment