Comparing two histograms

Comparing histograms is quite a subject in itself.

You’ve got two big classes of comparison functions : bin-to-bin comparison and cross-bin comparison.

Bin-to-bin comparison : As you stated, standard sum of differences is quite bad. There’s an improvement: the Chi-squared distance. If H1.red[0] = 0.001 and H2.red[0] = 0.011, then H2.red[0] is much more important than if H1.red[0] = 0.1 and H2.red[0] = 0.11, even though in both cases |H1.red[0] - H2.red[0]| = 0.01.
Cross-bin comparison : A standard example called the bin-similarity matrix requires some similarity matrix M where in M(i,j) is the similarity between the bins i and j. Assume bin[i] is red. If bin[j] is dark red, then M(i,j) is large. If bin[j] is green, M(i,j) is small. Then, the distance between histograms H1 and H2 would be sqrt((H1-H2)*M*(H1-H2)). This method takes in account what you’ve said about “close” bins! Earth Moving Distance (EMD) is another kind of cross-bin distance.

To finish, I’ve got three points :

You should read this paper on histogram distance. It’s quite easy and introduces you to histogram distances. All the distances I talked about are summed up nicely in chapter 1. Honestly, the final thing described in the article is not that complex, but it’s probably overkill for your case.
Cross-bin distance is very good, but can be costly (i.e : long to compute, because it involves a matrix, thus is O(n^2)). The simplest way to circumvent the expensive cross-bin computation (and it is widely done) is to do some soft assignment : if a pixel is red, then you should fill ALL the bins that are remotely looking like red (of course, giving more weight to the closest colors). Then you can use a bin-to-bin algorithm.
A bit more math-centric : the previous point was all about reducing a cross-bin comparison to a bin-to-bin comparison. In fact, it consists of implicitly diagonalizing the similarity matrix M. If you can diagonalize M = P'*D*P where P' is the transpose of P, then sqrt((H1-H2)'*M*(H1-H2)) = sqrt((H1-H2)'*P'*D*P*(H1-H2)) = sqrt((P(H1-H2))'*D*(P(H1-H2))). Depending on how trivial it is for you to compute P(H1-H2), this can save you computation time. Intuitively, if H1 is your original histogram, P*H1 is a soft assignment and you are using the implicit similarity matrix M = P'*Id*P.