Why do regex engines allow / automatically attempt matching at the end of the input string?

I am giving this answer just to demonstrate why a regex would want to allow any code appearing after the final $ anchor in the pattern. Suppose we needed to create a regex to match a string with the following rules:

  • starts with three numbers
  • followed by one or more letters, numbers, hyphen, or underscore
  • ends with only letters and numbers

We could write the following pattern:

^\d{3}[A-Za-z0-9\-_]*[A-Za-z0-9]$

But this is a bit bulky, because we have to use two similar character classes adjacent to each other. Instead, we could write the pattern as:

^\d{3}[A-Za-z0-9\-_]+$(?<!_|-)

or

^\d{3}[A-Za-z0-9\-_]+(?<!_|-)$

Here, we eliminated one of the character classes, and instead used a negative lookbehind after the $ anchor to assert that the final character was not underscore or hyphen.

Other than a lookbehind, it makes no sense to me why a regex engine would allow something to appear after the $ anchor. My point here is that a regex engine may allow a lookbehind to appear after the $, and there are cases for which it logically makes sense to do so.

Leave a Comment