How can I match overlapping strings with regex?

The string#match with a global flag regex returns an array of matched substrings. The /\d{3}/g regex matches and consumes (=reads into the buffer and advances its index to the position right after the currently matched character) 3 digit sequence. Thus, after “eating up” 123, the index is located after 3, and the only substring left for parsing is 45 – no match here.

I think the technique used at regex101.com is also worth considering here: use a zero-width assertion (a positive lookahead with a capturing group) to test all positions inside the input string. After each test, the RegExp.lastIndex (it’s a read/write integer property of regular expressions that specifies the index at which to start the next match) is advanced “manually” to avoid infinite loop.

Note it is a technique implemented in .NET (Regex.Matches), Python (re.findall), PHP (preg_match_all), Ruby (String#scan) and can be used in Java, too.
Here is a demo using matchAll:

var re = /(?=(\d{3}))/g;
console.log( Array.from('12345'.matchAll(re), x => x[1]) );

Here is an ES5 compliant demo:

var re = /(?=(\d{3}))/g;
var str="12345";
var m, res = [];
 
while (m = re.exec(str)) {
    if (m.index === re.lastIndex) {
        re.lastIndex++;
    }
    res.push(m[1]);
}

console.log(res);

Here is a regex101.com demo

Note that the same can be written with a “regular” consuming \d{3} pattern and manually set re.lastIndex to m.index+1 value after each successful match:

var re = /\d{3}/g;
var str="12345";
var m, res = [];

while (m = re.exec(str)) {
    res.push(m[0]);
    re.lastIndex = m.index + 1; // <- Important
}
console.log(res);

Leave a Comment