How does python and the regex module handle backslashes?

You need to understand that each time you write a pattern, it is first interpreted as a string before to be read and interpreted a second time by the regex engine.
Lets describe what happens:

>>> s="\r"

s contains the character CR.

>>> re.match('\r', s)
<_sre.SRE_Match object; span=(0, 1), match="\r">

Here the string '\r' is a string that contains CR, so a literal CR is given to the regex engine.

>>> re.match('\\r', s)
<_sre.SRE_Match object; span=(0, 1), match="\r">

The string is now a literal backslash and a literal r, the regex engine receives these two characters and since \r is a regex escape sequence that means a CR character too, you obtain a match too.

>>> re.match('\\\r', s)
<_sre.SRE_Match object; span=(0, 1), match="\r">

The string contains a literal backslash and a literal CR, the regex engine receives \ and CR, but since \CR isn’t a known regex escape sequence, the backslash is ignored and you obtain a match.

Note that for the regex engine, a literal backslash is the escape sequence \\ (so in a pattern string r'\\' or '\\\\')

Leave a Comment