assembly
Why are loops always compiled into “do…while” style (tail jump)?
Related: asm loop basics: While, Do While, For loops in Assembly Language (emu8086) Fewer instructions / uops inside the loop = better. Structuring the code outside the loop to achieve this is very often a good idea. Sometimes this requires “loop rotation” (peeling part of the first iteration so the actual loop body has the … Read more
Can num++ be atomic for ‘int num’?
This is absolutely what C++ defines as a Data Race that causes Undefined Behaviour, even if one compiler happened to produce code that did what you hoped on some target machine. You need to use std::atomic for reliable results, but you can use it with memory_order_relaxed if you don’t care about reordering. See below for … Read more
Fastest way to do horizontal SSE vector sum (or other reduction)
In general for any kind of vector horizontal reduction, extract / shuffle high half to line up with low, then vertical add (or min/max/or/and/xor/multiply/whatever); repeat until a there’s just a single element (with high garbage in the rest of the vector). If you start with vectors wider than 128-bit, narrow in half until you get … Read more