Capture groups are numbered left-to-right in the order they occur in the regex, not in the order they are matched. Here is a simplified view of your regex:
m/
(.+?) # group 1
(?: # the $code_block regex
(?&block)
(?(DEFINE)
(?<block> ... ) # group 2
)
)
(.+) # group 3
/xs
Named groups can also be accessed as numbered groups.
The 2nd group is the block
group. However, this group is only used as a named subpattern, not as a capture. As such, the $2
capture value is undef.
As a consequence, the text after the code-block will be stored in capture $3
.
There are two ways to deal with this problem:
-
For complex regexes, only use named capture. Consider a regex to be complex as soon as you assemble it from regex objects, or if captures are conditional. Here:
if ($text =~ m/(?<before>.+?)$code_block(?<afterwards>.+)/s){ print $+{before}; print $+{afterwards}; }
-
Put all your defines at the end, where they can’t mess up your capture numbering. For example, your
$code_block
regex would only define a named pattern which you then invoke explicitly.