How to implement “_mm_storeu_epi64” without aliasing problems?

SSE intrinsics is one of those niche corner cases where you have to push the rules a bit.

Since these intrinsics are compiler extensions (somewhat standardized by Intel), they are already outside the specification of the C and C++ language standards. So it’s somewhat self-defeating to try to be “standard compliant” while using a feature that clearly is not.

Despite the fact that the SSE intrinsic libraries try to act like normal 3rd party libraries, underneath, they are all specially handled by the compiler.


The Intent:

The SSE intrinsics were likely designed from the beginning to allow aliasing between the vector and scalar types – since a vector really is just an aggregate of the scalar type.

But whoever designed the SSE intrinsics probably wasn’t a language pedant.
(That’s not too surprising. Hard-core low-level performance programmers and language lawyering enthusiasts tend to be very different groups of people who don’t always get along.)

We can see evidence of this in the load/store intrinsics:

  • __m128i _mm_stream_load_si128(__m128i* mem_addr) – A load intrinsic that takes a non-const pointer?
  • void _mm_storeu_pd(double* mem_addr, __m128d a) – What if I want to store to __m128i*?

The strict aliasing problems are a direct result of these poor prototypes.

Starting from AVX512, the intrinsics have all been converted to void* to address this problem:

  • __m512d _mm512_load_pd(void const* mem_addr)
  • void _mm512_store_epi64 (void* mem_addr, __m512i a)

Compiler Specifics:

  • Visual Studio defines each of the SSE/AVX types as a union of the scalar types. This by itself allows strict-aliasing. Furthermore, Visual Studio doesn’t do strict-aliasing so the point is moot:

  • The Intel Compiler has never failed me with all sorts of aliasing. It probably doesn’t do strict-aliasing either – though I’ve never found any reliable source for this.

  • GCC does do strict-aliasing, but from my experience, not across function boundaries. It has never failed me to cast pointers which are passed in (on any type). GCC also declares SSE types as __may_alias__ thereby explicitly allowing it to alias other types.


My Recommendation:

  • For function parameters that are of the wrong pointer type, just cast it.
  • For variables declared and aliased on the stack, use a union. That union will already be aligned so you can read/write to them directly without intrinsics. (But be aware of store-forwarding issues that come with interleaving vector/scalar accesses.)
  • If you need to access a vector both as a whole and by its scalar components, consider using insert/extract intrinsics instead of aliasing.
  • When using GCC, turn on -Wall or -Wstrict-aliasing. It will tell you about strict-aliasing violations.

Leave a Comment