I just want to remove unicode
<U+00A6>
which is at the beginning of string.
Then you do not need a gsub
, you can use a sub
with "^\\s*<U\\+\\w+>\\s*"
pattern:
q <-"<U+00A6> 1000-66329"
sub("^\\s*<U\\+\\w+>\\s*", "", q)
Pattern details:
^
– start of string\\s*
– zero or more whitespaces<U\\+
– a literal char sequence<U+
\\w+
– 1 or more letters, digits or underscores>
– a literal>
\\s*
– zero or more whitespaces.
If you also need to replace the -
with a space, add |-
alternative and use gsub
(since now we expect several replacements and the replacement must be a space – same is in akrun’s answer):
trimws(gsub("^\\s*<U\\+\\w+>|-", " ", q))
See the R online demo