Text Extraction from HTML Java
jsoup Another html parser I really liked using was jsoup. You could get all the <p> elements in 2 lines of code. Document doc = Jsoup.connect(“http://en.wikipedia.org/”).get(); Elements ps = doc.select(“p”); Then write it out to a file in one more line out.write(ps.text()); //it will append all of the p elements together in one long string … Read more