This can be achieved simply through regular expressions and the grep family:
cleanFun <- function(htmlString) {
return(gsub("<.*?>", "", htmlString))
}
This will also work with multiple html tags in the same string!
This finds any instances of the pattern <.*?>
in the htmlString and replaces it with the empty string “”. The ? in .*?
makes it non greedy, so if you have multiple tags (e.g., <a> junk </a>
) it will match <a>
and </a>
instead of the whole string.