How can I match nested brackets using regex?

Many regex implementations will not allow you to match an arbitrary amount of nesting. However, Perl, PHP and .NET support recursive patterns.

A demo in Perl:

#!/usr/bin/perl -w

my $text="(outer
   (center
     (inner)
     (inner)
   center)
 ouer)
 (outer
   (inner)
 ouer)
 (outer
 ouer)";

while($text =~ /(\(([^()]|(?R))*\))/g) {
  print("----------\n$1\n");
}

which will print:

----------
(outer
   (center
     (inner)
     (inner)
   center)
 ouer)
----------
(outer
   (inner)
 ouer)
----------
(outer
 ouer)

Or, the PHP equivalent:

$text="(outer
   (center
     (inner)
     (inner)
   center)
 ouer)
 (outer
   (inner)
 ouer)
 (outer
 ouer)";

preg_match_all('/(\(([^()]|(?R))*\))/', $text, $matches);

print_r($matches);

which produces:

Array
(
    [0] => Array
        (
            [0] => (outer
   (center
     (inner)
     (inner)
   center)
 ouer)
            [1] => (outer
   (inner)
 ouer)
            [2] => (outer
 ouer)
        )

...

An explanation:

(          # start group 1
  \(       #   match a literal '('
  (        #   group 2
    [^()]  #     any char other than '(' and ')'
    |      #     OR
    (?R)   #     recursively match the entir pattern
  )*       #   end group 2 and repeat zero or more times
  \)       #   match a literal ')'
)          # end group 1

EDIT

Note @Goozak’s comment:

A better pattern might be \(((?>[^()]+)|(?R))*\) (from PHP:Recursive patterns). For my data, Bart’s pattern was crashing PHP when it encountered a (long string) without nesting. This pattern went through all my data without problem.

Leave a Comment