Parse Web Site HTML with JAVA [duplicate]

There is a much easier way to do this. I suggest using JSoup. With JSoup you can do things like

Document doc = Jsoup.connect("http://en.wikipedia.org/").get();
Elements newsHeadlines = doc.select("#mp-itn b a");

Or if you want the body:

Elements body = doc.select("body");

Or if you want all links:

Elements links = doc.select("body a");

You no longer need to get connections or handle streams. Simple. If you have ever used jQuery then it is very similar to that.

Leave a Comment