Methods to vectorise histogram in SIMD?

Histogramming is almost impossible to vectorize, unfortunately.

You can probably optimise the scalar code somewhat however – a common trick is to use two histograms and then combine them at the end. This allows you to overlap loads/increments/stores and thereby bury some of the serial dependencies and associated latencies. Pseudo code:

init histogram 1 to all 0s
init histogram 2 to all 0s
loop
  get input value 1
  get input value 2
  load count for value 1 from histogram 1
  load count for value 2 from histogram 2
  increment count for histogram 1
  increment count for histogram 2
  store count for value 1 to histogram 1
  store count for value 2 to histogram 2
until done
combine histogram 1 and histogram 2

Leave a Comment