I have implemented Vecmathlib https://bitbucket.org/eschnett/vecmathlib/ as a generic libraries for two other projects (The Einstein Toolkit, and pocl http://pocl.sourceforge.net/). Vecmathlib is open source, and is written in C++.
More Related Contents:
- practical BigNum AVX/SSE possible?
- How to solve the 32-byte-alignment issue for AVX load/store operations?
- What are the best instruction sequences to generate vector constants on the fly?
- How to efficiently perform double/int64 conversions with SSE/AVX?
- Loading 8 chars from memory into an __m256 variable as packed single precision floats
- Fast vectorized rsqrt and reciprocal with SSE/AVX depending on precision
- How to sum __m256 horizontally?
- Convention for displaying vector registers
- Fastest way to do horizontal vector sum with AVX instructions [duplicate]
- Find the first instance of a character using simd
- Fastest way to do horizontal SSE vector sum (or other reduction)
- AVX2 what is the most efficient way to pack left based on a mask?
- Why is this SSE code 6 times slower without VZEROUPPER on Skylake?
- Why doesn’t gcc resolve _mm256_loadu_pd as single vmovupd?
- How to implement atoi using SIMD?
- How to perform the inverse of _mm256_movemask_epi8 (VPMOVMSKB)?
- Using AVX CPU instructions: Poor performance without “/arch:AVX”
- Loop unrolling to achieve maximum throughput with Ivy Bridge and Haswell
- Where is Clang’s ‘_mm256_pow_ps’ intrinsic?
- The Effect of Architecture When Using SSE / AVX Intrinisics
- Which versions of Windows support/require which CPU multimedia extensions? (How to check if SSE or AVX are fully usable?)
- C++ error: ‘_mm_sin_ps’ was not declared in this scope
- How to determine if memory is aligned?
- Fastest way to unpack 32 bits to a 32 byte SIMD vector
- Getting started with Intel x86 SSE SIMD instructions
- Get member of __m128 by index?
- Load address calculation when using AVX2 gather instructions
- Compare 16 byte strings with SSE
- Difference between the AVX instructions vxorpd and vpxor
- Where can I find an official reference listing the operation of SSE intrinsic functions?