Can scrapy be used to scrape dynamic content from websites that are using AJAX?

Here is a simple example of scrapy with an AJAX request. Let see the site rubin-kazan.ru.

All messages are loaded with an AJAX request. My goal is to fetch these messages with all their attributes (author, date, …):

enter image description here

When I analyze the source code of the page I can’t see all these messages because the web page uses AJAX technology. But I can with Firebug from Mozilla Firefox (or an equivalent tool in other browsers) to analyze the HTTP request that generate the messages on the web page:

enter image description here

It doesn’t reload the whole page but only the parts of the page that contain messages. For this purpose I click an arbitrary number of page on the bottom:

enter image description here

And I observe the HTTP request that is responsible for message body:

enter image description here

After finish, I analyze the headers of the request (I must quote that this URL I’ll extract from source page from var section, see the code below):

enter image description here

And the form data content of the request (the HTTP method is “Post”):

enter image description here

And the content of response, which is a JSON file:

enter image description here

Which presents all the information I’m looking for.

From now, I must implement all this knowledge in scrapy. Let’s define the spider for this purpose:

class spider(BaseSpider):
    name="RubiGuesst"
    start_urls = ['http://www.rubin-kazan.ru/guestbook.html']

    def parse(self, response):
        url_list_gb_messages = re.search(r'url_list_gb_messages="(.*)"', response.body).group(1)
        yield FormRequest('http://www.rubin-kazan.ru' + url_list_gb_messages, callback=self.RubiGuessItem,
                          formdata={'page': str(page + 1), 'uid': ''})

    def RubiGuessItem(self, response):
        json_file = response.body

In parse function I have the response for first request.
In RubiGuessItem I have the JSON file with all information.

Leave a Comment