How to find the average colour of an image in Python with OpenCV?

How to fix the error

There are two potential causes for this error to happen:

  1. The file name is misspelled.
  2. The image file is not in the current working directory.

To fix this issue you should make sure the filename is correctly spelled (do case sensitive check just in case) and the image file is in the current working directory (there are two options here: you could either change the current working directory in your IDE or specify the full path of the file).

Average colour vs. dominant colour

Then to calculate the “average colour” you have to decide what you mean by that. In a grayscale image it is simply the mean of gray levels across the image. Colours are usually represented through 3-dimensional vectors whilst gray levels are scalars.

The average colour is the sum of all pixels divided by the number of pixels. However, this approach may yield a colour different to the most prominent visual color. What you might really want is dominant color rather than average colour.

Implementation

Let’s go through the code slowly. We start by importing the necessary modules and reading the image:

import cv2
import numpy as np
from skimage import io

img = io.imread('https://i.stack.imgur.com/DNM65.png')[:, :, :-1]

Then we can calculate the mean of each chromatic channel following a method analog to the one proposed by @Ruan B.:

average = img.mean(axis=0).mean(axis=0)

Next we apply k-means clustering to create a palette with the most representative colours of the image (in this toy example n_colors was set to 5).

pixels = np.float32(img.reshape(-1, 3))

n_colors = 5
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 200, .1)
flags = cv2.KMEANS_RANDOM_CENTERS

_, labels, palette = cv2.kmeans(pixels, n_colors, None, criteria, 10, flags)
_, counts = np.unique(labels, return_counts=True)

And finally the dominant colour is the palette colour which occurs most frequently on the quantized image:

dominant = palette[np.argmax(counts)]

Comparison of results

To illustrate the differences between both approaches I’ve used the following sample image:

lego

The obtained values for the average colour, i.e. a colour whose components are the means of the three chromatic channels, and the dominant colour calculated throug k-means clustering are rather different:

In [30]: average
Out[30]: array([91.63179156, 69.30190754, 58.11971896])

In [31]: dominant
Out[31]: array([179.3999  ,  27.341282,   2.294441], dtype=float32)

Let’s see how those colours look to better understand the differences between both approaches. On the left part of the figure below it is displayed the average colour. It clearly emerges that the calculated average colour does not properly describe the colour content of the original image. In fact, there’s no a single pixel with that colour in the original image. The right part of the figure shows the five most representative colours sorted from top to bottom in descending order of importance (occurrence frequency). This palette makes it evident that the dominant color is the red, which is consistent with the fact that the largest region of uniform colour in the original image corresponds to the red Lego piece.

Results

This is the code used to generate the figure above:

import matplotlib.pyplot as plt

avg_patch = np.ones(shape=img.shape, dtype=np.uint8)*np.uint8(average)

indices = np.argsort(counts)[::-1]   
freqs = np.cumsum(np.hstack([[0], counts[indices]/float(counts.sum())]))
rows = np.int_(img.shape[0]*freqs)

dom_patch = np.zeros(shape=img.shape, dtype=np.uint8)
for i in range(len(rows) - 1):
    dom_patch[rows[i]:rows[i + 1], :, :] += np.uint8(palette[indices[i]])
    
fig, (ax0, ax1) = plt.subplots(1, 2, figsize=(12,6))
ax0.imshow(avg_patch)
ax0.set_title('Average color')
ax0.axis('off')
ax1.imshow(dom_patch)
ax1.set_title('Dominant colors')
ax1.axis('off')
plt.show(fig)

TL;DR answer

In summary, despite the calculation of the average colour – as proposed in @Ruan B.’s answer – is correct, the yielded result may not adequately represent the colour content of the image. A more sensible approach is that of determining the dominant colour through vector quantization (clustering).

Leave a Comment