Parsing XML file containing HTML entities in Java without changing the XML

I would use a library like Jsoup for this purpose. I tested the following below and it works. I don’t know if this helps. It can be located here:

public static void main(String args[]){

    String html = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><foo>" + 
                  "<bar>Some&nbsp;text &mdash; invalid!</bar></foo>";
    Document doc = Jsoup.parse(html, "", Parser.xmlParser());

    for (Element e :"bar")) {



 Some&nbsp;text — invalid!

Loading from a file can be found here:

Leave a Comment