Strange branching performance

It looks like a minor JIT bug. For a small branch probability, it generates something like the following, just much more complicated due to unrolling (I’m simplifying a lot):

movzwl 0x16(%r8,%r10,2),%r9d

Get the char: int r9d = queries[r10]

imul   $0x8acbf29,%r9d,%ebx

Multiply: ebx = 145538857 * r9d

shr    $0x1b,%ebx

Shift: ebx >>>= 27

cmp    %edx,%ebx
jae    somewhere

Check bounds: if (ebx > edx || ebx < 0) goto somewhere (and throw there an IndexOutOfBoundsException.

cmp    %r9d,%ebx
jne    back

Jump back to loop beginning if not equal: if (r9d != ebx) goto back

inc    %eax

Increment the result: eax++

jne    back

Simply goto back

We can see one smart and one dumb thing:

  • The bound check gets done optimally with a single unsigned comparison.
  • It’s completely redundant as x >>> 27 is always positive and less than the table length (32).

For a branch probability above 20%, these three instructions

cmp    %r9d,%ebx
jne    back
inc    %eax

get replaced by something like

mov    %eax,%ecx   // ecx = result
inc    %ecx        // ecx++
cmp    %r9d,%ebx   // if (r9b == index)
cmoveq %ecx,%ebx   //    result = ecx

using a conditional move instruction. This is one instruction more, but no branching and thus no branch misprediction penalty.

This explains the very constant time in the 20-80% range. The ramping below 20% is clearly due to branch mispredictions.

So it looks like a JIT failure to use the proper threshold of about 0.04 rather than 0.18.

thr

Leave a Comment