Counting 1 bits (population count) on large data using AVX-512 or AVX-2

AVX-2 @HadiBreis’ comment links to an article on fast population-count with SSSE3, by Wojciech Muła; the article links to this GitHub repository; and the repository has the following AVX-2 implementation. It’s based on a vectorized lookup instruction, and using a 16-value lookup table for the bit counts of nibbles. # include <immintrin.h> # include <x86intrin.h> … Read more