Crawling multiple URLs in a loop using Puppeteer

map, forEach, reduce, etc, does not wait for the asynchronous operation within them, before they proceed to the next element of the iterator they are iterating over. There are multiple ways of going through each item of an iterator synchronously while performing an asynchronous operation, but the easiest in this case I think would be … Read more

How do you click on an element with text in Puppeteer?

Short answer This XPath expression will query a button which contains the text “Button text”: const [button] = await page.$x(“//button[contains(., ‘Button text’)]”); if (button) { await button.click(); } To also respect the <div class=”elements”> surrounding the buttons, use the following code: const [button] = await page.$x(“//div[@class=”elements”]/button[contains(., ‘Button text’)]”); Explanation To explain why using the text … Read more

Why can’t I access ‘window’ in an exposeFunction() function with Puppeteer?

evaluate the function You can pass the dynamic script using evaluate. (async function(){ var puppeteer = require(‘puppeteer’); const browser = await puppeteer.launch(); const page = await browser.newPage(); var functionToInject = function(){ return window.navigator.appName; } var data = await page.evaluate(functionToInject); // <– Just pass the function console.log(data); // outputs: Netscape await browser.close(); })() addScriptTag and readFileSync … Read more

How can I capture all network requests and full response data when loading a page in Chrome?

You can enable a request interception with page.setRequestInterception() for each request, and then, inside page.on(‘request’), you can use the request-promise-native module to act as a middle man to gather the response data before continuing the request with request.continue() in Puppeteer. Here’s a full working example: ‘use strict’; const puppeteer = require(‘puppeteer’); const request_client = require(‘request-promise-native’); … Read more

puppeteer: wait N seconds before continuing to the next line

You can use a little promise function, function delay(time) { return new Promise(function(resolve) { setTimeout(resolve, time) }); } Then, call it whenever you want a delay. console.log(‘before waiting’); await delay(4000); console.log(‘after waiting’); If you must use puppeteer use the builtin waitForTimeout function. await page.waitForTimeout(4000) If you still want to use page.evaluate, resolve it after 4 … Read more

Why does headless need to be false for Puppeteer to work?

The reason it might work in UI mode but not headless is that sites who aggressively fight scraping will detect that you are running in a headless browser. Some possible workarounds: Use puppeteer-extra Found here: https://github.com/berstend/puppeteer-extra Check out their docs for how to use it. It has a couple plugins that might help in getting … Read more

Message “Async callback was not invoked within the 5000 ms timeout specified by jest.setTimeout”

The timeout you specify here needs to be shorter than the default timeout. The default timeout is 5000 and the framework by default is jasmine in case of jest. You can specify the timeout inside the test by adding jest.setTimeout(30000); But this would be specific to the test. Or you can set up the configuration … Read more