Count the number of set bits in a 32-bit integer

This is known as the ‘Hamming Weight‘, ‘popcount’ or ‘sideways addition’. Some CPUs have a single built-in instruction to do it and others have parallel instructions which act on bit vectors. Instructions like x86’s popcnt (on CPUs where it’s supported) will almost certainly be fastest for a single integer. Some other architectures may have a … Read more

Fast counting the number of set bits in __m128i register

Here are some codes I used in an old project (there is a research paper about it). The function popcnt8 below computes the number of bits set in each byte. SSE2-only version (based on Algorithm 3 in Hacker’s Delight book): static const __m128i popcount_mask1 = _mm_set1_epi8(0x77); static const __m128i popcount_mask2 = _mm_set1_epi8(0x0F); static inline __m128i … Read more