Problems submitting a login form with Jsoup

Besides the username, password and the cookies, the site requeires two additional values for the login – VIEWSTATE and EVENTVALIDATION. You can get them from the response of the first Get request, like this – Document doc = loginForm.parse(); Element e = doc.select(“input[id=__VIEWSTATE]”).first(); String viewState = e.attr(“value”); e = doc.select(“input[id=__EVENTVALIDATION]”).first(); String eventValidation = e.attr(“value”); And … Read more

JSoup character encoding issue

The charset attribute is missing in HTTP response Content-Type header. Jsoup will resort to platform default charset when parsing the HTML. The Document.OutputSettings#charset() won’t work as it’s used for presentation only (on html() and text()), not for parsing the data (in other words, it’s too late already). You need to read the URL as InputStream … Read more

jsoup posting and cookie

When you login to the site, it is probably setting an authorised session cookie that needs to be sent on subsequent requests to maintain the session. You can get the cookie like this: Connection.Response res = Jsoup.connect(“http://www.example.com/login.php”) .data(“username”, “myUsername”, “password”, “myPassword”) .method(Method.POST) .execute(); Document doc = res.parse(); String sessionId = res.cookie(“SESSIONID”); // you will need … Read more

How do I preserve line breaks when using jsoup to convert html to plain text?

The real solution that preserves linebreaks should be like this: public static String br2nl(String html) { if(html==null) return html; Document document = Jsoup.parse(html); document.outputSettings(new Document.OutputSettings().prettyPrint(false));//makes html() preserve linebreaks and spacing document.select(“br”).append(“\\n”); document.select(“p”).prepend(“\\n\\n”); String s = document.html().replaceAll(“\\\\n”, “\n”); return Jsoup.clean(s, “”, Whitelist.none(), new Document.OutputSettings().prettyPrint(false)); } It satisfies the following requirements: if the original html contains newline(\n), … Read more