Regex – Repeating Capturing Group

Regex doesn’t support what you’re trying to do. When the engine enters the capturing group a second time, it overwrites what it had captured the first time. Consider a simple example (thanks regular-expressions.info): /(abc|123)+/ used on ‘abc123’. It will match “abc” then see the plus and try again, matching the “123”. The final capturing group … Read more

Can’t use ‘\1’ backreference to capture-group in a function call in re.sub() repr expression

The reason the re.sub(r'([0-9])’,A[int(r’\g<1>’)],S) does not work is that \g<1> (which is an unambiguous representation of the first backreference otherwise written as \1) backreference only works when used in the string replacement pattern. If you pass it to another method, it will “see” just \g<1> literal string, since the re module won’t have any chance … Read more

Are non-capturing groups redundant?

Your (?:wo)?men and (wo)?men are semantically equivalent, but technically are different, namely, the first is using a non-capturing and the other a capturing group. Thus, the question is why use non-capturing groups when we have capturing ones? Non-caprturing groups are of help sometimes. To avoid excessive number of backreferences (remember that it is sometimes difficult … Read more

Regex group capture in R with multiple capture-groups

str_match(), from the stringr package, will do this. It returns a character matrix with one column for each group in the match (and one for the whole match): > s = c(“(sometext :: 0.1231313213)”, “(moretext :: 0.111222)”) > str_match(s, “\\((.*?) :: (0\\.[0-9]+)\\)”) [,1] [,2] [,3] [1,] “(sometext :: 0.1231313213)” “sometext” “0.1231313213” [2,] “(moretext :: 0.111222)” … Read more