Regex for removing parts of the string

import re

txt = "A regular expression is a special text string for describing a search pattern."
pattern = "(.*) regular(.*) text(.*)"

result = re.sub(pattern, r"\1\2\3", txt)

print(result)    # for testing only

The explanation:

As you can see, your regular expression is

(.*) regular(.*) text(.*)

Expressions in parentheses are so called capture groups. All 3 have the same form:

.*

which means that they will match everything – . means any character, * means arbitrary number of them, including zero (empty string).

Now we may use the captured texts as \1, \2, \3, respectively, so your original text is in this notation the same as

\1 regular\2 text\3 

So in the re.sub() function we keep as substituting string only

\1\2\3

which effectively strip out the parts " regular" and " text".

Leave a Comment