Does lookaround affect which languages can be matched by regular expressions?

The answer to the question you ask, which is whether a larger class of languages than the regular languages can be recognised with regular expressions augmented by lookaround, is no. A proof is relatively straightforward, but an algorithm to translate a regular expression containing lookarounds into one without is messy. First: note that you can … Read more

Fixed Length Regex Required?

From the documentation: (?<!…) Matches if the current position in the string is not preceded by a match for …. This is called a negative lookbehind assertion. Similar to positive lookbehind assertions, the contained pattern must only match strings of some fixed length. Patterns which start with negative lookbehind assertions may match at the beginning … Read more

Backreferences in lookbehind

Looks like your suspicion is correct that backreferences generally can’t be used in Java lookbehinds. The workaround you proposed makes the finite length of the lookbehind explicit and looks very clever to me. I was intrigued to find out what Python does with this regex. Python only supports fixed-length lookbehind, not finite-length like Java, but … Read more

What’s the technical reason for “lookbehind assertion MUST be fixed length” in regex?

Lookahead and lookbehind aren’t nearly as similar as their names imply. The lookahead expression works exactly the same as it would if it were a standalone regex, except it’s anchored at the current match position and it doesn’t consume what it matches. Lookbehind is a whole different story. Starting at the current match position, it … Read more

Regular Expression Lookbehind doesn’t work with quantifiers (‘+’ or ‘*’)

Many regular expression libraries do only allow strict expressions to be used in look behind assertions like: only match strings of the same fixed length: (?<=foo|bar|\s,\s) (three characters each) only match strings of fixed lengths: (?<=foobar|\r\n) (each branch with fixed length) only match strings with a upper bound length: (?<=\s{,4}) (up to four repetitions) The … Read more