A union is probably the most portable way to do this:
union {
__m128 v; // SSE 4 x float vector
float a[4]; // scalar array of 4 floats
} U;
float vectorGetByIndex(__m128 V, unsigned int i)
{
U u;
assert(i <= 3);
u.v = V;
return u.a[i];
}
More Related Contents:
- C++ error: ‘_mm_sin_ps’ was not declared in this scope
- Most efficient way to check if all __m128i components are 0 [using
- AVX2 what is the most efficient way to pack left based on a mask?
- How to solve the 32-byte-alignment issue for AVX load/store operations?
- print a __m128i variable
- How to implement atoi using SIMD?
- How to efficiently perform double/int64 conversions with SSE/AVX?
- SIMD prefix sum on Intel cpu
- Loading 8 chars from memory into an __m256 variable as packed single precision floats
- What’s the difference between logical SSE intrinsics?
- SSE, intrinsics, and alignment
- Do I get a performance penalty when mixing SSE integer/float SIMD instructions
- Emulating shifts on 32 bytes with AVX
- Visual Studio 2017: _mm_load_ps often compiled to movups
- How to implement “_mm_storeu_epi64” without aliasing problems?
- SSE reduction of float vector
- Where can I find an official reference listing the operation of SSE intrinsic functions?
- Is `reinterpret_cast`ing between hardware SIMD vector pointer and the corresponding type an undefined behavior?
- Why is this program erroneously rejected by three C++ compilers?
- Using AVX CPU instructions: Poor performance without “/arch:AVX”
- How is a vector’s data aligned?
- Standard-layout and tail padding
- Unoptimized clang++ code generates unneeded “movl $0, -4(%rbp)” in a trivial main()
- Overloaded lambdas in C++ and differences between clang and gcc
- Clang doesn’t see basic headers
- Conflict between copy constructor and forwarding constructor
- Most insanely fast way to convert 9 char digits into an int or unsigned int
- Getting Clang to work on windows
- Injected class name compiler discrepancy
- Why is this SIMD multiplication not faster than non-SIMD multiplication?