Replace HTML codes with equivalent characters in Java [duplicate]

Also, is there any way to optimize this regex?

Yes, don’t use regex for this task, use Apache StringEscapeUtils from Apache commons lang:

import org.apache.commons.lang.StringEscapeUtils;
...
String withCharacters = StringEscapeUtils.unescapeHtml(yourString);

JavaDoc says:

Unescapes a string containing entity escapes to a string containing
the actual Unicode characters corresponding to the escapes. Supports
HTML 4.0 entities.

For example, the string "<Français>" will become "<Français>"

If an entity is unrecognized, it is left alone, and inserted verbatim into the result string. e.g. ">&zzzz;x" will become ">&zzzz;x".

More Related Contents:

Leave a Comment Cancel reply