Regex negative lookbehind not valid in JavaScript [duplicate]

2020 update: Javascript implementations are beginning to natively support regular expression lookbehinds. A draft proposal for RegExp Lookbehind Assertions, accepted to the ECMA-262 draft specs for ECMAScript 2021, was implemented in V8‘s Irregexp in Chrome 62+ (released 2017-10-17) and that has been picked up via a shim layer for Irregexp in Firefox 78+ (ESR, released 2020-06-30). Other JS interpreters will follow.

See more detailed support listings here.


Legacy workaround to implement lookbehinds

JavaScript lacks support for regular expression lookbehinds like (?<=…) (positive) and (?<!…) (negative), but that doesn’t mean you can’t still implement this sort of logic in JavaScript.

Matching (not global)

Positive lookbehind match:

// from /(?<=foo)bar/i
var matcher = mystring.match( /foo(bar)/i );
if (matcher) {
  // do stuff with matcher[1] which is the part that matches "bar"
}

Fixed width negative lookbehind match:

// from /(?<!foo)bar/i
var matcher = mystring.match( /(?!foo)(?:^.{0,2}|.{3})(bar)/i );
if (matcher) {
  // do stuff with matcher[1] ("bar"), which does not follow "foo"
}

Negative lookbehinds can be done without the global flag, but only with a fixed width, and you have to calculate that width (which can get difficult with alternations). Using (?!foo).{3}(bar) would be simpler and roughly equivalent, but it won’t match a line starting with “rebar” since . can’t match newlines, so we need the above code’s alternation to match lines featuring “bar” before character four.

If you need it with a variable width, use the below global solution and put a break at the end of the if stanza. (This limitation is quite common. .NET, vim, and JGsoft are the only regex engines that support variable width lookbehind. PCRE, PHP, and Perl are limited to fixed width. Python requires an alternate regex module to support this. That said, the logic to the workaround below should work for all languages that support regex.)

Matching (global)

When you need to loop on each match in a given string (the g modifier, global matching), you have to redefine the matcher variable in each loop iteration and you must use RegExp.exec() (with the RegExp created before the loop) because String.match() interprets the global modifier differently and will create an infinite loop!

Global positive lookbehind:

var re = /foo(bar)/gi;  // from /(?<=foo)bar/gi
while ( matcher = re.exec(mystring) ) {
  // do stuff with matcher[1] which is the part that matches "bar"
}

“Stuff” may of course include populating an array for further use.

Global Negative lookbehind:

var re = /(foo)?bar/gi;  // from /(?<!foo)bar/gi
while ( matcher = re.exec(mystring) ) {
  if (!matcher[1]) {
    // do stuff with matcher[0] ("bar"), which does not follow "foo"
  }
}

Note that there are cases in which this will not fully represent the negative lookbehind. Consider /(?<!ba)ll/g matching against Fall ball bill balll llama. It will find only three of the desired four matches because when it parses balll, it finds ball and then continues one character late at l llama. This only occurs when a partial match at the end could interfere with a partial match at a different end (balll breaks (ba)?ll but foobarbar is fine with (foo)?bar) The only solution to this is to use the above fixed width method.

Replacing

Mimicking Lookbehind in JavaScript is a great article that describes how to do this.

It even has a follow-up that points to a collection of short functions that implement this in JS.

Implementing lookbehind in String.replace() is much easier since you can create an anonymous function as the replacement and handle the lookbehind logic in that function.

These work on the first match but can be made global by merely adding the g modifier.

Positive lookbehind replacement:

// assuming you wanted mystring.replace(/(?<=foo)bar/i, "baz"):
mystring = mystring.replace( /(foo)?bar/i,
  function ($0, $1) { return ($1 ? $1 + "baz" : $0) }
);

This takes the target string and replaces instances of bar with baz so long as they follow foo. If they do, $1 is matched and the ternary operator (?:) returns the matched text and the replacement text (but not the bar part). Otherwise, the ternary operator returns the original text.

Negative lookbehind replacement:

// assuming you wanted mystring.replace(/(?<!foo)bar/i, "baz"):
mystring = mystring.replace( /(foo)?bar/i,
  function ($0, $1) { return ($1 ? $0 : "baz") }
);

This is essentially the same, but since it’s a negative lookbehind, it acts when $1 is missing (we don’t need to say $1 + "baz" here because we know $1 is empty).

This has the same caveat as the other dynamic-width negative lookbehind workaround and is similarly fixed by using the fixed width method.

Leave a Comment