Extracting matches with the original case used in the pattern during a case insensitive search

You may build the pattern dynamically to include indices of the words you search for in the group names and then grab those pattern parts that matched:

import re

words = ["ERP", "Gap"]
words_dict = { f'g{i}':item for i,item in enumerate(words) } 

rx = rf"\b(?:{'|'.join([ rf'(?P<g{i}>{item})' for i,item in enumerate(words) ])})\b"

text="ERP is integral part of GAP, so erp can never be ignored, ErP!"

results = []
for match in re.finditer(rx, text, flags=re.IGNORECASE):
    results.append( [words_dict.get(key) for key,value in match.groupdict().items() if value][0] )

print(results) # => ['ERP', 'Gap', 'ERP', 'ERP']

See the Python demo online

The pattern will look like \b(?:(?P<g0>ERP)|(?P<g1>Gap))\b:

  • \b – a word boundary
  • (?: – start of a non-capturing group encapsulating pattern parts:
    • (?P<g0>ERP) – Group “g0”: ERP
    • | – or
    • (?P<g1>Gap) – Group “g1”: Gap
  • ) – end of the group
  • \b – a word boundary.

See the regex demo.

Note [0] with [words_dict.get(key) for key,value in match.groupdict().items() if value][0] will work in all cases since when there is a match, only one group matched.

Leave a Comment