Which versions of Windows support/require which CPU multimedia extensions? (How to check if SSE or AVX are fully usable?)

Extensions that introduce new architectural state require special OS support, because the OS has to save/restore restore more data on context switches. So from the OSes perspective, there’s nothing extra it needs to do to let user-space code run SSSE3 instructions, if the OS supports SSE. SSE, AVX, and AVX512 are the extensions that introduced … Read more

AVX/SSE version of xorshift128+

For anyone else who might reach this question, I think this C++ code implements correctly 4 xorshift128plus generators running in parallel, using AVX2: __m256i xorshift128plus_avx2(__m256i &state0, __m256i &state1) { __m256i s1 = state0; const __m256i s0 = state1; state0 = s0; s1 = _mm256_xor_si256(s1, _mm256_slli_epi64(s1, 23)); state1 = _mm256_xor_si256(_mm256_xor_si256(_mm256_xor_si256(s1, s0), _mm256_srli_epi64(s1, 18)), _mm256_srli_epi64(s0, 5)); return … Read more

How to detect SSE/SSE2/AVX/AVX2/AVX-512/AVX-128-FMA/KCVI availability at compile-time?

Most compilers will automatically define: __SSE__ __SSE2__ __SSE3__ __AVX__ __AVX2__ etc, according to whatever command line switches you are passing. You can easily check this with gcc (or gcc-compatible compilers such as clang), like this: $ gcc -msse3 -dM -E – < /dev/null | egrep “SSE|AVX” | sort #define __SSE__ 1 #define __SSE2__ 1 #define … Read more