In regular expressions, what is a backtracking / back referencing?

Backreferences and backtracking are two different things. The former is using the results of a capture later in code, e.g.

(['"]).*?\1

This will match a single- or double-quoted string (ignoring escapes for the moment). It uses a backreference to refer to the open symbol (the single or double quote) so it can match that at the end.

Backtracking, on the other hand, is what regular expressions do naturally during the course of matching when a match fails. For example, if I’m matching the expression

.+b

against the string

aaaaaabcd

then it will first match aaaaaabc on the .+ and compare b against the remaining d. This fails, so it backtracks a bit and matches aaaaaab for the .+ and then compares the final b against the c. This fails too, so it backtracks again and tries aaaaaa for the .+ and the matches the b against the b and succeeds.

Leave a Comment