How to force GCC to assume that a floating-point expression is non-negative?

You can write assert(x*x >= 0.f) as a compile-time promise instead of a runtime check as follows in GNU C: #include <cmath> float test1 (float x) { float tmp = x*x; if (!(tmp >= 0.0f)) __builtin_unreachable(); return std::sqrt(tmp); } (related: What optimizations does __builtin_unreachable facilitate? You could also wrap if(!x)__builtin_unreachable() in a macro and call … Read more

Does Skylake need vzeroupper for turbo clocks to recover after a 512-bit instruction that only reads a ZMM register, writing a k mask?

No, a vpcmpeqb into a mask register does not trigger slow mode if you use a zmm register as one of the comparands, at least on SKX. This is also true of any of any other instruction (as far as I tested) which only reads the key 512-bit registers (the key registers being zmm0 – … Read more

How can the rep stosb instruction execute faster than the equivalent loop?

In modern CPUs, rep stosb‘s and rep movsb‘s microcoded implementation actually uses stores that are wider than 1B, so it can go much faster than one byte per clock. (Note this only applies to stos and movs, not repe cmpsb or repne scasb. They’re still slow, unfortunately, like at best 2 cycles per byte compared … Read more

Does using xor reg, reg give advantage over mov reg, 0? [duplicate]

an actual answer for you: Intel 64 and IA-32 Architectures Optimization Reference Manual Section 3.5.1.7 is where you want to look. In short there are situations where an xor or a mov may be preferred. The issues center around dependency chains and preservation of condition codes. In processors based on Intel Core microarchitecture, a number … Read more

Is the conditional operator slow?

Very odd, perhaps .NET optimization is backfireing in your case: The author disassembled several versions of ternary expressions and found that they are identical to if-statements, with one small difference. The ternary statement sometimes produces code that tests the opposite condition that you would expect, as in it tests that the subexpression is false instead … Read more