Stripping HTML tags in Java [duplicate]

Use JSoup, it’s well documented, available on Maven and after a day of spending time with several libraries, for me, it is the best one i can imagine.. My own opinion is, that a job like that, parsing html into plain-text, should be possible in one line of code -> otherwise the library has failed somehow… just saying ^^ So here it is, the one-liner of JSoup – in Markdown4J, something like that is not possible, in Markdownj too, in htmlCleaner this is pain in the ass with somewhat about 50 lines of code…

String plain = new HtmlToPlainText().getPlainText(Jsoup.parse(html));

And what you got is real plain-text (not just the html-source-code as a String, like in other libs lol) -> he really does a great job on that. It is more or less the same quality as Markdownify for PHP….

Leave a Comment