I recently wrote a rather long article on this topic: The true power of regular expressions.
To summarize:
- Regular expressions with support for recursive subpattern references can match all context-free languages (e.g
a^n b^n
). - Regular expressions with lookaround assertions and subpattern references can match at least some context-sensitive languages (e.g.
ww
anda^n b^n c^n
). - If the assertions have unlimited width (as you say), then all context-sensitive grammars can be matched. I don’t know any regex flavor though that does not have fixed-width restrictions on lookbehind (and at the same time supports subpattern references).
- Regular expressions with backreferences are NP-complete, so any other NP problem can be solved using regular expressions (after applying a polynomial-time transformation).
Some examples:
-
Matching the context-free language
{a^n b^n, n>0}
:/^(a(?1)?b)$/ # or /^ (?: a (?= a* (\1?+ b) ) )+ \1 $/x
-
Matching the context-sensitive language
{a^n b^n c^n, n>0}
:/^ (?=(a(?-1)?b)c) a+(b(?-1)?c) $/x # or /^ (?: a (?= a* (\1?+ b) b* (\2?+ c) ) )+ \1 \2 $/x