The confusion is due to the fact that the backslash character \
is used as an escape at two different levels. First, the Python interpreter itself performs substitutions for \
before the re
module ever sees your string. For instance, \n
is converted to a newline character, \t
is converted to a tab character, etc. To get an actual \
character, you can escape it as well, so \\
gives a single \
character. If the character following the \
isn’t a recognized escape character, then the \
is treated like any other character and passed through, but I don’t recommend depending on this. Instead, always escape your \
characters by doubling them, i.e. \\
.
If you want to see how Python is expanding your string escapes, just print out the string. For example:
s="a\\b\tc"
print(s)
If s
is part of an aggregate data type, e.g. a list or a tuple, and if you print that aggregate, Python will enclose the string in single quotes and will include the \
escapes (in a canonical form), so be aware of how your string is being printed. If you just type a quoted string into the interpreter, it will also display it enclosed in quotes with \
escapes.
Once you know how your string is being encoded, you can then think about what the re
module will do with it. For instance, if you want to escape \
in a string you pass to the re
module, you will need to pass \\
to re
, which means you will need to use \\\\
in your quoted Python string. The Python string will end up with \\
and the re
module will treat this as a single literal \
character.
An alternative way to include \
characters in Python strings is to use raw strings, e.g. r'a\b'
is equivalent to "a\\b"
.