Detect white characters on black background using Tesseract

A paper by T. Kasar, J. Kumar, and A. G. Ramakrishnan describes one solution to the problem: “Font and Background Color Independent Text Binarization”. The paper can be found here. There is an implementation of the algorithm by Jason Funk. His implementation can be found here.
I have had some success with the algorithm. I think this type of solution is what you are looking for.

You might also find it helpful to review this recently asked question on background removal (OpenCV for OCR: How to compute thresholding levels for gray image OCR) and its answer. You may be able separate regions of interest by background color and then hand each region to tesseract for processing. Alternatively, post binarization you could invert the 8×8 pixel regions (described in answer above) in the black background portion of the image (or vice versus) to create a uniform background.

Finally, you may find some useful information by searching for solutions to the number plate recognition problem (or license plates). Many number plates (license plates) have background images or lighting artifacts that can interfere with recognition. The more general problem is background removal.

Leave a Comment