Simple Screen Scraping using jQuery

Use $.ajax to load the other page into a variable, then create a temporary element and use .html() to set the contents to the value returned. Loop through the element’s children of nodeType 1 and keep their first children’s nodeValues. If the external page is not on your web server you will need to proxy the file with your own web server.

Something like this:

$.ajax({
     url: "/thePageToScrape.html",
     dataType: 'text',
     success: function(data) {
          var elements = $("<div>").html(data)[0].getElementsByTagName("ul")[0].getElementsByTagName("li");
          for(var i = 0; i < elements.length; i++) {
               var theText = elements[i].firstChild.nodeValue;
               // Do something here
          }
     }
});

Leave a Comment