Also, is there any way to optimize this regex?
Yes, don’t use regex for this task, use Apache StringEscapeUtils from Apache commons lang:
import org.apache.commons.lang.StringEscapeUtils;
...
String withCharacters = StringEscapeUtils.unescapeHtml(yourString);
JavaDoc says:
Unescapes a string containing entity escapes to a string containing
the actual Unicode characters corresponding to the escapes. Supports
HTML 4.0 entities.For example, the string
"<Français>"
will become"<Français>"
If an entity is unrecognized, it is left alone, and inserted verbatim into the result string. e.g.
">&zzzz;x"
will become">&zzzz;x"
.