Comparing histograms is quite a subject in itself.
You’ve got two big classes of comparison functions : bin-to-bin comparison and cross-bin comparison.
- Bin-to-bin comparison : As you stated, standard sum of differences is quite bad. There’s an improvement: the Chi-squared distance. If
H1.red[0] = 0.001
andH2.red[0] = 0.011
, thenH2.red[0]
is much more important than ifH1.red[0] = 0.1
andH2.red[0] = 0.11
, even though in both cases|H1.red[0] - H2.red[0]| = 0.01
. - Cross-bin comparison : A standard example called the bin-similarity matrix requires some similarity matrix
M
where inM(i,j)
is the similarity between the binsi
andj
. Assumebin[i]
is red. Ifbin[j]
is dark red, thenM(i,j)
is large. Ifbin[j]
is green,M(i,j)
is small. Then, the distance between histogramsH1
andH2
would besqrt((H1-H2)*M*(H1-H2))
. This method takes in account what you’ve said about “close” bins! Earth Moving Distance (EMD) is another kind of cross-bin distance.
To finish, I’ve got three points :
- You should read this paper on histogram distance. It’s quite easy and introduces you to histogram distances. All the distances I talked about are summed up nicely in chapter 1. Honestly, the final thing described in the article is not that complex, but it’s probably overkill for your case.
- Cross-bin distance is very good, but can be costly (i.e : long to compute, because it involves a matrix, thus is O(n^2)). The simplest way to circumvent the expensive cross-bin computation (and it is widely done) is to do some soft assignment : if a pixel is red, then you should fill ALL the bins that are remotely looking like red (of course, giving more weight to the closest colors). Then you can use a bin-to-bin algorithm.
- A bit more math-centric : the previous point was all about reducing a cross-bin comparison to a bin-to-bin comparison. In fact, it consists of implicitly diagonalizing the similarity matrix M. If you can diagonalize
M = P'*D*P
whereP'
is the transpose ofP
, thensqrt((H1-H2)'*M*(H1-H2)) = sqrt((H1-H2)'*P'*D*P*(H1-H2)) = sqrt((P(H1-H2))'*D*(P(H1-H2)))
. Depending on how trivial it is for you to computeP(H1-H2)
, this can save you computation time. Intuitively, ifH1
is your original histogram,P*H1
is a soft assignment and you are using the implicit similarity matrixM = P'*Id*P
.