Why does headless need to be false for Puppeteer to work?

The reason it might work in UI mode but not headless is that sites who aggressively fight scraping will detect that you are running in a headless browser.

Some possible workarounds:

Use puppeteer-extra

Found here: https://github.com/berstend/puppeteer-extra
Check out their docs for how to use it. It has a couple plugins that might help in getting past headless-mode detection:

  1. puppeteer-extra-plugin-anonymize-ua — anonymizes your User Agent. Note that this might help with getting past headless mode detection, but as you’ll see if you visit https://amiunique.org/ it is unlikely to be enough to keep you from being identified as a repeat visitor.
  2. puppeteer-extra-plugin-stealth — this might help win the cat-and-mouse game of not being detected as headless. There are many tricks that are employed to detect headless mode, and as many tricks to evade them.

Run a “real” Chromium instance/UI

It’s possible to run a single browser UI in a manner that let’s you attach puppeteer to that running instance. Here’s an article that explains it: https://medium.com/@jaredpotter1/connecting-puppeteer-to-existing-chrome-window-8a10828149e0

Essentially you’re starting Chrome or Chromium (or Edge?) from the command line with --remote-debugging-port=9222 (or any old port?) plus other command line switches depending on what environment you’re running it in. Then you use puppeteer to connect to that running instance instead of having it do the default behavior of launching a headless Chromium instance: const browser = await puppeteer.connect({ browserURL: ENDPOINT_URL });. Read the puppeteer docs here for more info: https://pptr.dev/#?product=Puppeteer&version=v5.2.1&show=api-puppeteerlaunchoptions

The ENDPOINT_URL is displayed in the terminal when you launch the browser from the command line with the --remote-debugging-port=9222 option.

This option is going to require some server/ops mojo, so be prepared to do a lot more Stack Overflow searches. 🙂

There are other strategies I’m sure but those are the two I’m most familiar with. Good luck!

Leave a Comment