Tesseract running error

You can grab eng.traineddata Github: wget https://github.com/tesseract-ocr/tessdata/raw/main/eng.traineddata Check https://github.com/tesseract-ocr/tessdata for a full list of trained language data. When you grab the file(s), move them to the /usr/local/share/tessdata folder. Warning: some Linux distributions (such as openSUSE and Ubuntu) may be expecting it in /usr/share/tessdata instead. # If you got the data from Google, unzip it first! … Read more

Limit characters tesseract is looking for

Create a config file (e.g “letters”) in tessdata/configs directory – usually /usr/share/tesseract/tessdata/configs or /usr/share/tesseract-ocr/tessdata/configs And add this line to the config file: tessedit_char_whitelist abcdefghijklmnopqrstuvwxyz …or maybe [a-z] works. I don’t know. Then call tesseract similar to this: tesseract input.tif output nobatch letters That will limit tesseract to recognize only the wanted characters.