How can I remove repeated characters in a string with R?

I did not think very carefully on this, but this is my quick solution using references in regular expressions:

gsub('([[:alpha:]])\\1+', '\\1', 'Buenaaaaaaaaa Suerrrrte')
# [1] "Buena Suerte"

() captures a letter first, \\1 refers to that letter, + means to match it once or more; put all these pieces together, we can match a letter two or more times.

To include other characters besides alphanumerics, replace [[:alpha:]] with a regex matching whatever you wish to include.

Leave a Comment