Explanation
Nested group doesn’t automatically causes catastrophic backtracking. In your case, it is because your regex degenerates to the classical example of catastrophic backtracking (a*)*
.
Since \s
in optional in ^([a-zA-Z0-9'-]+\s?)*$
, on input without any spaces but has characters outside the allowed list, the regex simply degenerates to ^([a-zA-Z0-9'-]+)*$
.
You can also think in term of expansion of the original regex:
[a-zA-Z0-9'-]+\s?[a-zA-Z0-9'-]+\s?[a-zA-Z0-9'-]+\s?[a-zA-Z0-9'-]+\s?...
Since \s
is optional, we can remove it:
[a-zA-Z0-9'-]+[a-zA-Z0-9'-]+[a-zA-Z0-9'-]+[a-zA-Z0-9'-]+...
And we got a series of consecutive [a-zA-Z0-9'-]+
, which will try all ways to distribute the characters between themselves and blow up the complexity.
Solution
The standard way to write a regex to match token delimiter token ... delimiter token
is token (delimiter token)*
. While it is possible to rewrite the regex avoid repeating token
, I’d recommend against it, since it is harder to get it right. To avoid repetition , you might want to construct the regex by string concatenation instead.
Following the recipe above:
^[a-zA-Z0-9'-]+(\s[a-zA-Z0-9'-]+)*$
Although you can see repetition in repetition here, there is no catastrophic backtracking, since the regex can only expand to:
[a-zA-Z0-9'-]+\s[a-zA-Z0-9'-]+\s[a-zA-Z0-9'-]+\s[a-zA-Z0-9'-]+...
And \s
and [a-zA-Z0-9'-]
are mutual exclusive – there is only one way to match any string.