There are a number of performance analysis tools specialized for Parallel/MPI Programs, such as:
- Score-P, which works with a number of different Analysis tools, e.g. Cube, Vampir
- HPCToolkit uses sampling only, so you do not have to recompile your application
- Tau
At first they may not be as simple to use simple to use, but they provide much more help to investigate the performance of parallel applications.