why does GCC __builtin_prefetch not improve performance?

Yes, some recent versions of GCC (e.g. 4.9 in march 2015) are able to issue some PREFETCH instruction when optimizing with -O3 (even without any explicit __builtin_prefetch)

We don’t know what get_neighbor is doing, and what are the types of v and neigh_val.

And prefetching is not always profitable. Adding explicit __builtin_prefetch can slow down your code. You need to measure.

As Retired Ninja commented, prefetching in one loop and hoping data would be cached in the following loop (further down in your source code) is wrong.

You might perhaps try instead

for (size_t i = 0; i < v.get_num_edges(); i++) {
  fg::vertex_id_t id = v.get_neighbor(i);
  __builtin_prefetch (neigh_val[v.get_neighbor(i+4)]);
  res += neigh_vals[id];
}

You could empirically replace the 4 with whatever appropriate constant is the best.

But I guess that the __builtin_prefetch above is useless (since the compiler is probably able to add it by itself) and it could harm (or even crash the program, when computing its argument gives undefined behavior, e.g. if v.get_neighbor(i+4) is undefined; however prefetching an address outside of your address space won’t harm -but could slow down your program). Please benchmark.

See this answer to a related question.

Notice that in C++ all of [], get_neighbor could be overloaded and becomes very complex operations, so we cannot guess!

And there are cases where the hardware is limiting performance, whatever __builtin_prefetch you add (and adding them could hurt performance)

BTW, you might pass -O3 -mtune=native -fdump-tree-ssa -S -fverbose-asm to understand more what the compiler is doing (and look inside generated dump files and assembler files); also, it does happen that -O3 produces slightly slower code than what -O2 gives.

You could consider explicit multithreading, OpenMP, OpenCL if you have time to waste on optimization. Remember that premature optimization is evil. Did you benchmark, did you profile your entire application?

Leave a Comment