Getting the bounding box of the recognized words using python-tesseract

Use pytesseract.image_to_data() import pytesseract from pytesseract import Output import cv2 img = cv2.imread(‘image.jpg’) d = pytesseract.image_to_data(img, output_type=Output.DICT) n_boxes = len(d[‘level’]) for i in range(n_boxes): (x, y, w, h) = (d[‘left’][i], d[‘top’][i], d[‘width’][i], d[‘height’][i]) cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2) cv2.imshow(‘img’, img) cv2.waitKey(0) Among the data returned by pytesseract.image_to_data(): … Read more

How do I resolve a TesseractNotFoundError?

I got this error because I installed pytesseract with pip but forget to install the binary. On Linux sudo apt update sudo apt install tesseract-ocr sudo apt install libtesseract-dev On Mac brew install tesseract On Windows download binary from https://github.com/UB-Mannheim/tesseract/wiki. then add pytesseract.pytesseract.tesseract_cmd = ‘C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe’ to your script. (replace path of tesseract binary … Read more

Detect text region in image using Opencv

import cv2 def captch_ex(file_name): img = cv2.imread(file_name) img_final = cv2.imread(file_name) img2gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) ret, mask = cv2.threshold(img2gray, 180, 255, cv2.THRESH_BINARY) image_final = cv2.bitwise_and(img2gray, img2gray, mask=mask) ret, new_img = cv2.threshold(image_final, 180, 255, cv2.THRESH_BINARY) # for black text , cv.THRESH_BINARY_INV ”’ line 8 to 12 : Remove noisy portion ”’ kernel = cv2.getStructuringElement(cv2.MORPH_CROSS, (3, 3)) # … Read more