OCR and character similarity

for recognition or classification most OCR’s use neural networks

These must be properly configured to desired task like number of layers internal interconnection architecture , and so on. Also problem with neural networks is that they must be properly trained which is pretty hard to do properly because you will need to know for that things like proper training dataset size (so it contains enough information and do not over-train it). If you do not have experience with neural networks do not go this way if you need to implement it yourself !!!

There are also other ways to compare patterns

  1. vector approach

    • polygonize image (edges or border)
    • compare polygons similarity (surface area, perimeter, shape ,….)
  2. pixel approach

    You can compare images based on:

    • histogram
    • DFT/DCT spectral analysis
    • size
    • number of occupied pixels per each line
    • start position of occupied pixel in each line (from left)
    • end position of occupied pixel in each line (from right)
    • these 3 parameters can be done also for rows
    • points of interest list (points where is some change like intensity bump,edge,…)

    You create feature list for each tested character and compare it to your font and then the closest match is your character. Also these feature list can be scaled to some fixed size (like 64x64) so the recognition became invariant on scaling.

    Here is sample of features I use for OCR

    OCR character features

    In this case (the feature size is scaled to fit in NxN) so each character has 6 arrays by N numbers like:

     int row_pixels[N]; // 1nd image
     int lin_pixels[N]; // 2st image
     int row_y0[N];     // 3th image green
     int row_y1[N];     // 3th image red
     int lin_x0[N];     // 4th image green
     int lin_x1[N];     // 4th image red
    

    Now: pre-compute all features for each character in your font and for each readed character. Find the most close match from font

    • min distance between all feature vectors/arrays
    • not exceeding some threshold difference

    This is partially invariant on rotation and skew up to a point. I do OCR for filled characters so for outlined font it may have use some tweaking

[Notes]

For comparison you can use distance or correlation coefficient

Leave a Comment