Regex to match a C-style multiline comment

The best multiline comment regex is an unrolled version of (?s)/\*.*?\*/ that looks like

String pat = "/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/";

See the regex demo and explanation at regex101.com.

In short,

  • /\* – match the comment start /*
  • [^*]*\*+ – match 0+ characters other than * followed with 1+ literal *
  • (?:[^/*][^*]*\*+)* – 0+ sequences of:
    • [^/*][^*]*\*+ – not a / or * (matched with [^/*]) followed with 0+ non-asterisk characters ([^*]*) followed with 1+ asterisks (\*+)
  • / – closing /

David’s regex needs 26 steps to find the match in my example string, and my regex needs just 12 steps. With huge inputs, David’s regex is likely to fail with a stack overflow issue or something similar because the .*? lazy dot matching is inefficient due to lazy pattern expansion at each location the regex engine performs, while my pattern matches linear chunks of text in one go.

Leave a Comment