Fastest way to do horizontal vector sum with AVX instructions [duplicate]
If you have two __m256d vectors x1 and x2 that each contain four doubles that you want to horizontally sum, you could do: __m256d x1, x2; // calculate 4 two-element horizontal sums: // lower 64 bits contain x1[0] + x1[1] // next 64 bits contain x2[0] + x2[1] // next 64 bits contain x1[2] + … Read more