What’s wrong with using associativity by compilers?

Here is how I understand your two cases: In the first case, you have a loop that takes N steps; in the second case you manually merged two consecutive iterations of the first case into one, so you only need to do N/2 steps in the second case. Your second case runs faster and you are wondering why the dumb compiler couldn’t do it automatically.

There is nothing that would prevent the compiler from doing such an optimization. But please note that this re-write of the original loop leads to larger executable size: You have more instructions inside the for loop and the additional if after the loop.
If N=1 or N=3, the original loop is likely to be faster (less branching and better caching/prefetching/branch prediction). It made things faster in your case but it may make things slower in other cases. It is not a clear cut whether it is worth doing this optimization and it can be highly nontrivial to implement such an optimization in a compiler.

By the way, what you have done is very similar to loop vectorization but in your case, you did the parallel step manually and plugged-in the result. Eric Brumer’s Compiler Confidential talk will give you insight why rewriting loops in general is tricky and what drawbacks / disadvantages there are (larger executable size, potentially slower in some cases). So compiler writers are very well aware of this optimization possibility and are actively working on it but it is highly nontrivial in general and can also make things slower.


Please try something for me:

int a = 0;
for (int i=0; i<N; ++i) 
  a = ((a<<5) - a) + t[i];

assuming M1=31. In principle, the compiler should be smart enough to rewrite 31*a into (a<<5)-a but I am curious if it really does that.

Leave a Comment