Python re infinite execution

Your regex runs into catastrophic backtracking because you have nested quantifiers (([...]+)*). Since your regex requires the string to end in / (which fails on your example), the regex engine tries all permutations of the string in the vain hope to find a matching combination. That’s where it gets stuck.

To illustrate, let’s assume "A*BCD" as the input to your regex and see what happens:

  1. (\w+) matches A. Good.
  2. \* matches *. Yay.
  3. [\w\s]+ matches BCD. OK.
  4. / fails to match (no characters left to match). OK, let’s back up one character.
  5. / fails to match D. Hum. Let’s back up some more.
  6. [\w\s]+ matches BC, and the repeated [\w\s]+ matches D.
  7. / fails to match. Back up.
  8. / fails to match D. Back up some more.
  9. [\w\s]+ matches B, and the repeated [\w\s]+ matches CD.
  10. / fails to match. Back up again.
  11. / fails to match D. Back up some more, again.
  12. How about [\w\s]+ matches B, repeated [\w\s]+ matches C, repeated [\w\s]+ matches D? No? Let’s try something else.
  13. [\w\s]+ matches BC. Let’s stop here and see what happens.
  14. Darn, / still doesn’t match D.
  15. [\w\s]+ matches B.
  16. Still no luck. / doesn’t match C.
  17. Hey, the whole group is optional (...)*.
  18. Nope, / still doesn’t match B.
  19. OK, I give up.

Now that was a string of just three letters. Yours had about 30, trying all permutations of which would keep your computer busy until the end of days.

I suppose what you’re trying to do is to get the strings before/after *, in which case, use

pattern = r"(\w+)\*([\w\s]+)$"

Leave a Comment