Short example of regular expression converted to a state machine?

A rather convenient way to help look at this to use python’s little-known re.DEBUG flag on any pattern:

>>> re.compile(r'<([A-Z][A-Z0-9]*)\b[^>]*>(.*?)</\1>', re.DEBUG)
literal 60
subpattern 1
  in
    range (65, 90)
  max_repeat 0 65535
    in
      range (65, 90)
      range (48, 57)
at at_boundary
max_repeat 0 65535
  not_literal 62
literal 62
subpattern 2
  min_repeat 0 65535
    any None
literal 60
literal 47
groupref 1
literal 62

The numbers after ‘literal’ and ‘range’ refer to the integer values of the ascii characters they’re supposed to match.

Leave a Comment