compiler-optimization - w3toppers.com

Replacing a 32-bit loop counter with 64-bit introduces crazy performance deviations with _mm_popcnt_u64 on Intel CPUs

Answer recommended by Intel

gcc optimization flag -O3 makes code slower than -O2

gcc -O3 uses a cmov for the conditional, so it lengthens the loop-carried dependency chain to include a cmov (which is 2 uops and 2 cycles of latency on your Intel Sandybridge CPU, according to Agner Fog’s instruction tables. See also the x86 tag wiki). This is one of the cases where cmov sucks. If … Read more

Is a sign or zero extension required when adding a 32bit offset to a pointer for the x86-64 ABI?

Why are elementwise additions much faster in separate loops than in a combined loop?

Answer recommended by Intel

C loop optimization help for final assignment (with compiler optimization disabled)

Re-posting a modified version of my answer from optimized sum of an array of doubles in C, since that question got voted down to -5. The OP of the other question phrased it more as “what else is possible”, so I took him at his word and info-dumped about vectorizing and tuning for current CPU … Read more

How to compile Tensorflow with SSE4.2 and AVX instructions?

Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?

How to remove “noise” from GCC/clang assembly output?

Stripping out the .cfi directives, unused labels, and comment lines is a solved problem: the scripts behind Matt Godbolt’s compiler explorer are open source on its github project. It can even do colour highlighting to match source lines to asm lines (using the debug info). You can set it up locally so you can feed … Read more

Why doesn’t GCC optimize aaaaaa to (aaa)(aaa)?

Because Floating Point Math is not Associative. The way you group the operands in floating point multiplication has an effect on the numerical accuracy of the answer. As a result, most compilers are very conservative about reordering floating point calculations unless they can be sure that the answer will stay the same, or unless you … Read more