Why is this SSE code 6 times slower without VZEROUPPER on Skylake?

Leave a Comment