Regex lazy quantifier behave greedy

The \[.*?\]\[2\] pattern works like this:

  • \[ – finds the leftmost [ (as the regex engine processes the string input from left to right)
  • .*? – matches any 0+ chars other than line break chars, as few as possible, but as many as needed for a successful match, as there are subsequent patterns, see below
  • \]\[2\]][2] substring.

So, the .*? gets expanded upon each failure until it finds the leftmost ][2]. Note the lazy quantifiers do not guarantee the “shortest” matches.

Solution

Instead of a .*? (or .*) use negated character classes that match any char but the boundary char.

\[[^\]\[]*\]\[2\]

See this regex demo.

Here, .*? is replaced with [^\]\[]* – 0 or more chars other than ] and [.

Other examples:

  • Strings between angle brackets: <[^<>]*> matches <...> with no < and > inside
  • Strings between parentheses: \([^()]*\) matches (...) with no ( and ) inside
  • Strings between double quotation marks: "[^"]*" matches "..." with no " inside
  • Strings between curly braces: \{[^{}]*} matches "..." with no " inside

In other situations, when the starting pattern is a multichar string or complex pattern, use a tempered greedy token, (?:(?!start).)*?. To match abc 1 def in abc 0 abc 1 def, use abc(?:(?!abc).)*?def.

Leave a Comment