Library to query HTML with XPath in Java?

There are several different approaches to this documented on the Web:

HtmlCleaner / Java DOM parser – Using XPath Contains against HTML in Java (This is the way I recommend)
HtmlCleaner itself has a built in utility supporting XPath – See the javadocs http://htmlcleaner.sourceforge.net/doc/org/htmlcleaner/XPather.html or this example http://thinkandroid.wordpress.com/2010/01/05/using-xpath-and-html-cleaner-to-parse-html-xml/

Jericho and Jaxen
http://sujitpal.blogspot.com/2009/04/xpath-over-html-using-jericho-and-jaxen.html

I have tried a few different variations of these approaches, i.e. HtmlParser plus the Java DOM parser, and JSoup plus Jaxen, but the combination that worked best is HtmlCleaner plus the Java DOM parser. The next best combination was Jericho plus Jaxen.

More Related Contents:

Leave a Comment Cancel reply