Web scraping with jsoup in Kotlin

I was able to scrape what you were looking for from this page on that same site.
Even if it’s not what you want, the procedure may help someone in the future.

Here is how I did that:

  1. First I opened that page
  2. Then I opened the Chrome developer tools by pressing CTRL+
    SHIFT+i or
    by right-clicking somewhere on page and selecting Inspect or
    by clicking ⋮ ➜ More toolsDeveloper tools
  3. Next I selected the Network tab
  4. And finally I refreshed the page with F5 or with the refresh button ⟳

A list of requests start to appear (network log) and after, say, a few seconds, all requests will complete executing. Here, we want to look for and inspect a request that has a Type like xhr. We can filter requests by clicking the filter icon and then selecting the desired type.

To inspect a request, click on its name (first column from left):

Clicking on a request name

Clicking on one of the XHR requests, and then selecting the Response tab shows that the response contains exactly what we are looking for. And it is HTML, so jsoup can parse it:

Response tab for a request

Here is that response (if you want to copy or manipulate it):

<div style="vertical-align:top;">
  <div>
    <div style="float:left; width:120px; font-weight:bold;">
      Next Jackpot
    </div>
    <span style="color:#EC243D; font-weight:bold">$8,000,000 est</span>
  </div>
  <div>
    <div style="float:left; width:120px; font-weight:bold;">
      Next Draw
    </div>
    <div class="toto-draw-date">Mon, 15 Nov 2021 , 9.30pm</div>
  </div>
</div>

By selecting the Headers tab (to the left of the Response tab), we see the Request URL is https://www.singaporepools.com.sg/DataFileArchive/Lottery/Output/toto_next_draw_estimate_en.html?v=2021y11m14d21h0m and the Request Method is GET and agian the Content-Type is text/html.

So, with the URL and the HTTP method we found, here is the code to scrape that HTML:

val document = Jsoup
    .connect("https://www.singaporepools.com.sg/DataFileArchive/Lottery/Output/toto_next_draw_estimate_en.html?v=2021y11m14d21h0m")
    .userAgent("Mozilla")
    .get()

val targetElement = document
    .body()
    .children()
    .single()

val phrase = targetElement.child(0).text()
val prize = targetElement.select("span").text().removeSuffix(" est")

println(phrase) // Next Jackpot $8,000,000 est
println(prize)  // $8,000,000

Leave a Comment