Reading HTML file to DOM tree using Java

JTidy, either by processing the stream to XHTML then using your favourite DOM implementation to re-parse, or using parseDOM if the limited DOM imp that gives you is enough.

Alternatively Neko.

Leave a Comment