Visual Studio 2017: _mm_load_ps often compiled to movups

On recent versions of Visual Studio and the Intel Compiler (recent as post-2013?), the compiler rarely ever generates aligned SIMD load/stores anymore.

When compiling for AVX or higher:

  • The Microsoft compiler (>VS2013?) doesn’t generate aligned loads. But it still generates aligned stores.
  • The Intel compiler (> Parallel Studio 2012?) doesn’t do it at all anymore. But you’ll still see them in ICC-compiled binaries inside their hand-optimized libraries like memset().
  • As of GCC 6.1, it still generates aligned load/stores when you use the aligned intrinsics.

The compiler is allowed to do this because it’s not a loss of functionality when the code is written correctly. All processors starting from Nehalem have no penalty for unaligned load/stores when the address is aligned.

Microsoft’s stance on this issue is that it “helps the programmer by not crashing”. Unfortunately, I can’t find the original source for this statement from Microsoft anymore. In my opinion, this achieves the exact opposite of that because it hides misalignment penalties. From the correctness standpoint, it also hides incorrect code.

Whatever the case is, unconditionally using unaligned load/stores does simplify the compiler a bit.

New Relevations:

  • Starting Parallel Studio 2018, the Intel Compiler no longer generates aligned moves at all – even for pre-Nehalem targets.
  • Starting from Visual Studio 2017, the Microsoft Compiler also no longer generates aligned moves at all – even when targeting pre-AVX hardware.

Both cases result in inevitable performance degradation on older processors. But it seems that this is intentional as both Intel and Microsoft no longer care about old processors.


The only load/store intrinsics that are immune to this are the non-temporal load/stores. There is no unaligned equivalent of them, so the compiler has no choice.

So if you want to just test for correctness of your code, you can substitute in the load/store intrinsics for non-temporal ones. But be careful not to let something like this slip into production code since NT load/stores (NT-stores in particular) are a double-edged sword that can hurt you if you don’t know what you’re doing.

Leave a Comment