Any recommended Java profiling tutorial? [closed]

Profiling is a subject having more than one school of thought.

The more popular one is that you proceed by getting measurements. That is, you try to see how long each function takes and/or how many times it is called. Clearly, if a function takes very little time, then speeding it up will gain you little. But if it takes a lot of time, then you have to do detective work to figure out what part of the function is responsible for the time. Do not expect function times to add up to total time, because functions call each other, and the reason function A may take a lot of time is that it calls function B that also takes a lot of time.

This approach can find a lot of problems, but it depends on you being a good detective and being able to think clearly about different kinds of time, like wall-clock time versus CPU time, and self-time versus inclusive time. For example, an application may appear to be slow but the function times may be all reported as near zero. This can be caused by the program being I/O bound. If the I/O is something that you expect, that may be fine, but it may be doing some I/O that you don’t know about, and then you are back to detective work.

General expectation with profilers is that if you can fix enough things to get a 10% or 20% speedup, that’s pretty good, and I never hear stories of profilers being used repeatedly to get speedups of much more than that.

Another approach is not to measure, but to capture. It is based on the idea that, during a time when the program is taking longer (in wall-clock time) than you would like, you want to know what it is doing, predominantly, and one way to find out is to stop it and ask, or take a snapshot of its state and analyze it to understand completely what it is doing and why it is doing it at that particular point in time. If you do this multiple times and you see something that it is trying to do at multiple times, then that activity is something that you could fruitfully optimize. The difference is that you are not asking how much; you are asking what and why. Here’s another explanation. (Notice that the speed of taking such a snapshot doesn’t matter, because you’re not asking about time, you’re asking what the program is doing and why.)

In the case of Java, here is one low-tech but highly effective way to do that, or you can use the “pause” button in Eclipse. Another way is to use a particular type of profiler, one that samples the entire call stack, on wall-clock time (not CPU unless you want to be blind to I/O), when you want it to sample (for example, not when waiting for user input), and summarizes at the level of lines of code, not just at the level of functions, and percent of time, not absolute time. To get percent of time, it should tell you, for each line of code that appears on any sample, the percent of samples containing that line, because if you could make that line go away, you would save that percent. (You should ignore other things it tries to tell you about, like call graphs, recursion, and self-time.) There are very few profilers that meet this specification, but one is RotateRight/Zoom, but I’m not sure if it works with Java, and there may be others.

In some cases it may be difficult to get stack samples when you want them, during the time of actual slowness. Then, since what you are after is percentages, you can do anything to the code that makes it easier to get samples without altering the percentages. One way is to amplify the code by wrapping a temporary loop around it of, say, 100 iterations. Another way is, under a debugger, to set a data-change breakpoint. This will cause the code to be interpreted 10-100 times slower than normal. Another way is to employ an alarm-clock timer to go off during the period of slowness, and use it to grab a sample.

With the capturing technique, if you use it repeatedly to find and perform multiple optimizations, you can expect to reach near-optimal performance. In the case of large software, where bottlenecks are more numerous, this can mean substantial factors. People on Stack Overflow have reported factors from 7x to 60x. Here is a detailed example of 43x.

The capturing technique has trouble with cases where it is hard to figure out why the threads are waiting when they are, such as when waiting for a transaction to complete on another processor. (Measuring has the same problem.) In those cases, I use a laborious method of merging time-stamped logs.

Leave a Comment