It looks like a minor JIT bug. For a small branch probability, it generates something like the following, just much more complicated due to unrolling (I’m simplifying a lot):
movzwl 0x16(%r8,%r10,2),%r9d
Get the char: int r9d = queries[r10]
imul $0x8acbf29,%r9d,%ebx
Multiply: ebx = 145538857 * r9d
shr $0x1b,%ebx
Shift: ebx >>>= 27
cmp %edx,%ebx
jae somewhere
Check bounds: if (ebx > edx || ebx < 0) goto somewhere
(and throw there an IndexOutOfBoundsException
.
cmp %r9d,%ebx
jne back
Jump back to loop beginning if not equal: if (r9d != ebx) goto back
inc %eax
Increment the result: eax++
jne back
Simply goto back
We can see one smart and one dumb thing:
- The bound check gets done optimally with a single unsigned comparison.
- It’s completely redundant as x >>> 27 is always positive and less than the table length (32).
For a branch probability above 20%, these three instructions
cmp %r9d,%ebx
jne back
inc %eax
get replaced by something like
mov %eax,%ecx // ecx = result
inc %ecx // ecx++
cmp %r9d,%ebx // if (r9b == index)
cmoveq %ecx,%ebx // result = ecx
using a conditional move instruction. This is one instruction more, but no branching and thus no branch misprediction penalty.
This explains the very constant time in the 20-80% range. The ramping below 20% is clearly due to branch mispredictions.
So it looks like a JIT failure to use the proper threshold of about 0.04 rather than 0.18.