What are the costs of failed store-to-load forwarding on x86?

It is not really a full answer, but still evidence that the penalty is visible. MSVC 2022 benchmark, compiler with /std:c++latest. #include <chrono> #include <iostream> struct alignas(16) S { char* a; int* b; }; extern “C” void init_fused_copy_unfused(int n, S & s2, S & s1); extern “C” void init_fused_copy_fused(int n, S & s2, S & … Read more

How can I mitigate the impact of the Intel jcc erratum on gcc?

By compiler: GCC: -Wa,-mbranches-within-32B-boundaries clang (10+): -mbranches-within-32B-boundaries compiler option directly, not -Wa. MSVC: /QIntel-jcc-erratum See Intel JCC Erratum – what is the effect of prefixes used for mitigation? ICC: TODO, look for docs. The GNU toolchain does mitigation in the assembler, with as -mbranches-within-32B-boundaries, which enables (GAS manual: x86 options): -malign-branch-boundary=32 (care about 32-byte boundaries). … Read more