Reliably detecting PhantomJS-based spam bots

I very much share your take on CAPTCHA. I’ll list what I have been able to detect so far, for my own detection script, with similar goals. It’s only partial, as they are many more headless browsers. Fairly safe to use exposed window properties to detect/assume those particular headless browser: window._phantom (or window.callPhantom) //phantomjs window.__phantomas … Read more

How to get the Infobox data from Wikipedia?

Use the Mediawiki API through this Python library: https://github.com/siznax/wptools Usage: import wptools so = wptools.page(‘Stack Overflow’).get_parse() infobox = so.data[‘infobox’] print(infobox) Output: {‘alexa’: ‘{{Increase}} 34 ( {{as of|2019|12|15|lc|=|y}} )’, ‘author’: ‘[[Jeff Atwood]] and [[Joel Spolsky]]’, ‘caption’: ‘Screenshot of Stack Overflow in February 2017’, ‘commercial’: ‘Yes’, ‘content_license’: ‘[[Creative Commons license|CC-BY-SA]] 4.0’, ‘current_status’: ‘Online’, ‘language’: ‘English, Spanish, Russian, … Read more