The observed variance is caused by background JIT compilation.
This is how the process looks like:
- Method
f()
starts execution in interpreter. - After a number of invocations (around 250) the method is scheduled for compilation.
- The compiler thread works in parallel to the application thread. Meanwhile the method continues execution in interpreter.
- As soon as the compiler thread finishes compilation, the method entry point is replaced, so the next call to
f()
will invoke the compiled version of the method.
There is basically a race between applcation thread and JIT compiler thread. Interpreter may perform different number of calls before the compiled version of the method is ready. At the end there is a mix of interpreted and compiled frames.
No wonder that compiled frame layout differs from interpreted one. Compiled frames are usually smaller; they don’t need to store all the execution context on the stack (method reference, constant pool reference, profiler data, all arguments, expression variables etc.)
Futhermore, there is even more race possibilities with Tiered Compilation (default since JDK 8). There can be a combination of 3 types of frames: interpreter, C1 and C2 (see below).
Let’s have some fun experiments to support the theory.
-
Pure interpreted mode. No JIT compilation.
No races => stable results.$ java -Xint Main 11895 11895 11895
-
Disable background compilation. JIT is ON, but is synchronized with the application thread.
No races again, but the number of calls is now higher due to compiled frames.$ java -XX:-BackgroundCompilation Main 23462 23462 23462
-
Compile everything with C1 before execution. Unlike previous case there will be no interpreted frames on the stack, so the number will be a bit higher.
$ java -Xcomp -XX:TieredStopAtLevel=1 Main 23720 23720 23720
-
Now compile everything with C2 before execution. This will produce the most optimized code with the smallest frame. The number of calls will be the highest.
$ java -Xcomp -XX:-TieredCompilation Main 59300 59300 59300
Since the default stack size is 1M, this should mean the frame now is only 16 bytes long. Is it?
$ java -Xcomp -XX:-TieredCompilation -XX:CompileCommand=print,Main.f Main 0x00000000025ab460: mov %eax,-0x6000(%rsp) ; StackOverflow check 0x00000000025ab467: push %rbp ; frame link 0x00000000025ab468: sub $0x10,%rsp 0x00000000025ab46c: movabs $0xd7726ef0,%r10 ; r10 = Main.class 0x00000000025ab476: addl $0x2,0x68(%r10) ; Main.counter += 2 0x00000000025ab47b: callq 0x00000000023c6620 ; invokestatic f() 0x00000000025ab480: add $0x10,%rsp 0x00000000025ab484: pop %rbp ; pop frame 0x00000000025ab485: test %eax,-0x23bb48b(%rip) ; safepoint poll 0x00000000025ab48b: retq
In fact, the frame here is 32 bytes, but JIT has inlined one level of recursion.
-
Finally, let’s look at the mixed stack trace. In order to get it, we’ll crash JVM on StackOverflowError (option available in debug builds).
$ java -XX:AbortVMOnException=java.lang.StackOverflowError Main
The crash dump
hs_err_pid.log
contains the detailed stack trace where we can find interpreted frames at the bottom, C1 frames in the middle and lastly C2 frames on the top.Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) J 164 C2 Main.f()V (12 bytes) @ 0x00007f21251a5958 [0x00007f21251a5900+0x0000000000000058] J 164 C2 Main.f()V (12 bytes) @ 0x00007f21251a5920 [0x00007f21251a5900+0x0000000000000020] // ... repeated 19787 times ... J 164 C2 Main.f()V (12 bytes) @ 0x00007f21251a5920 [0x00007f21251a5900+0x0000000000000020] J 163 C1 Main.f()V (12 bytes) @ 0x00007f211dca50ec [0x00007f211dca5040+0x00000000000000ac] J 163 C1 Main.f()V (12 bytes) @ 0x00007f211dca50ec [0x00007f211dca5040+0x00000000000000ac] // ... repeated 1866 times ... J 163 C1 Main.f()V (12 bytes) @ 0x00007f211dca50ec [0x00007f211dca5040+0x00000000000000ac] j Main.f()V+8 j Main.f()V+8 // ... repeated 1839 times ... j Main.f()V+8 j Main.main([Ljava/lang/String;)V+0 v ~StubRoutines::call_stub