Convert HTML Character Entities back to regular text using javascript

You could do something like this: String.prototype.decodeHTML = function() { var map = {“gt”:”>” /* , … */}; return this.replace(/&(#(?:x[0-9a-f]+|\d+)|[a-z]+);?/gi, function($0, $1) { if ($1[0] === “#”) { return String.fromCharCode($1[1].toLowerCase() === “x” ? parseInt($1.substr(2), 16) : parseInt($1.substr(1), 10)); } else { return map.hasOwnProperty($1) ? map[$1] : $0; } }); };

Is there a Java XML API that can parse a document without resolving character entities?

The STaX API has support for the notion of not replacing character entity references, by way of the IS_REPLACING_ENTITY_REFERENCES property: Requires the parser to replace internal entity references with their replacement text and report them as characters This can be set into an XmlInputFactory, which is then in turn used to construct an XmlEventReader or … Read more