Regex that can match empty string is breaking the javascript regex engine

JS works differently than PCRE. The point is that the JS regex engine does not handle zero-length matches well, the index is just manually incremented and the next character after a zero-length match is skipped. The ^-? can match an empty string, and it matches the 12,345,678.90 start, skipping 1.

If we have a look at the String#match documentation, we will see that each call to match with a global regex increases the regex object’s lastIndex after the zero-length match is found:

  1. Else, global is true
    a. Call the [[Put]] internal method of rx with arguments “lastIndex” and 0.
    b. Let A be a new array created as if by the expression new Array() where Array is the standard built-in constructor with that name.
    c. Let previousLastIndex be 0.
    d. Let n be 0.
    e. Let lastMatch be true.
    f. Repeat, while lastMatch is true
        i. Let result be the result of calling the [[Call]] internal method of exec with rx as the this value and argument list containing S.
        ii. If result is null, then set lastMatch to false.
        iii. Else, result is not null
            1. Let thisIndex be the result of calling the [[Get]] internal method of rx with argument “lastIndex“.
            2. If thisIndex = previousLastIndex then
                a. Call the [[Put]] internal method of rx with arguments “lastIndex” and thisIndex+1.
                b. Set previousLastIndex to thisIndex+1.

So, the matching process goes from 8a till 8f initializing the auxiliary structures, then a while block is entered (repeated until lastMatch is true, an internal exec command matches the empty space at the start of the string (8fi -> 8fiii), and as the result is not null, thisIndex is set to the lastIndex of the previous successful match, and as the match was zero-length (basically, thisIndex = previousLastIndex), the previousLastIndex is set to thisIndex+1which is skipping the current position after a successful zero-length match.

You may actually use a simpler regex inside a replace method and use a callback to use appropriate replacements:

var res="-12,345,678.90".replace(/(\D)(?!.*\D)|^-|\D/g, function($0,$1) {
   return $1 ? "." : "";
});
console.log(res);

Pattern details:

  • (\D)(?!.*\D) – a non-digit (captured into Group 1) that is not followed with 0+ chars other than a newline and another non-digit
  • | – or
  • ^- – a hyphen at the string start
  • | – or
  • \D – a non-digit

Note that here you do not even have to make the hyphen at the start optional.

Leave a Comment