What support is there for PCRE (Perl Compatible Regular Expressions) in common languages?

It seems that more mainstream languages actually use their own implementation of “Perl-like” regexes than actually use libpcre. Languages that fall into this class include (at the very least) Java, JavaScript, and Python.

Java’s java.util.regex library uses a syntax that’s very heavily based on Perl (approx. version 5.8) regexes, including the rules for escaping, the \p and \P Unicode classes, non-greedy and “possessive” quantifiers, backreferences, \Q..\E quoting, and several of the (?...) constructs including non-capturing groups, zero-width lookahead/behind, and non-backtracking groups. In fact Java regexes seem to have more in common with Perl regexes than libpcre does. 🙂

The JavaScript language also uses regexes that are derived from Perl; Unicode classes, lookbehind, possessive quantifiers, and non-backtracking groups are absent, but the rest of what I mentioned for Java is present as well in JS.

Python’s regex syntax is also based on Perl 5’s, with non-greedy quantifiers, most of the (?...) constructs including non-capturing groups, look-ahead/behind and conditional patterns, as well as named capture groups (but with a different syntax than either Perl or PCRE). Non-backtracking groups and ‘possessive’ quantifiers are (as far as I can see) absent, as are \p and \P Unicode character classes, although the standard \d, \s, and \w classes are Unicode-aware if requested.

Leave a Comment