Remove all punctuation except apostrophes in R

x <- “I like %$@to*&, chew;: gum, but don’t like|}{[] bubble@#^)( gum!?” gsub(“[^[:alnum:][:space:]’]”, “”, x) [1] “I like to chew gum but don’t like bubble gum” The above regex is much more straight forward. It replaces everything that’s not alphanumeric signs, space or apostrophe (caret symbol!) with an empty string.

How to remove unicode from string?

I just want to remove unicode <U+00A6> which is at the beginning of string. Then you do not need a gsub, you can use a sub with “^\\s*<U\\+\\w+>\\s*” pattern: q <-“<U+00A6> 1000-66329” sub(“^\\s*<U\\+\\w+>\\s*”, “”, q) Pattern details: ^ – start of string \\s* – zero or more whitespaces <U\\+ – a literal char sequence <U+ … Read more