How to programmatically fill input elements built with React?

This accepted solution appears not to work in React > 15.6 (including React 16) as a result of changes to de-dupe input and change events. You can see the React discussion here: https://github.com/facebook/react/issues/10135 And the suggested workaround here: https://github.com/facebook/react/issues/10135#issuecomment-314441175 Reproduced here for convenience: Instead of input.value=”foo”; input.dispatchEvent(new Event(‘input’, {bubbles: true})); You would use function setNativeValue(element, … Read more

Scrapy CrawlSpider doesn’t crawl the first landing page

Just change your callback to parse_start_url and override it: from scrapy.contrib.spiders import CrawlSpider, Rule from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor class DownloadSpider(CrawlSpider): name=”downloader” allowed_domains = [‘bnt-chemicals.de’] start_urls = [ “http://www.bnt-chemicals.de”, ] rules = ( Rule(SgmlLinkExtractor(allow=’prod’), callback=’parse_start_url’, follow=True), ) fname = 0 def parse_start_url(self, response): self.fname += 1 fname=”%s.txt” % self.fname with open(fname, ‘w’) as f: f.write(‘%s, %s\n’ … Read more

Crawling the Google Play store

First of all, Google Play’s robots.txt does NOT disallow the pages with base “/store/apps”. If you want to crawl Google Play you would need to develop your own web crawler, parse the HTML page and extract the app meta-data you need (e.g. title, descriptions, price, etc). This topic has been covered in this other question. … Read more

Creating a generic scrapy spider

You could create a run-time spider which is evaluated by the interpreter. This code piece could be evaluated at runtime like so: a = open(“test.py”) from compiler import compile d = compile(a.read(), ‘spider.py’, ‘exec’) eval(d) MySpider <class ‘__main__.MySpider’> print MySpider.start_urls [‘http://www.somedomain.com’]

scrapy- how to stop Redirect (302)

yes you can do this simply by adding meta values like meta={‘dont_redirect’: True} also you can stop redirected for a particular response code like meta={‘dont_redirect’: True,”handle_httpstatus_list”: [302]} it will stop redirecting only 302 response codes. you can add as many http status code you want to avoid redirecting them. example yield Request(‘some url’, meta = … Read more

Python: Disable images in Selenium Google ChromeDriver

Here is another way to disable images: from selenium import webdriver chrome_options = webdriver.ChromeOptions() prefs = {“profile.managed_default_content_settings.images”: 2} chrome_options.add_experimental_option(“prefs”, prefs) driver = webdriver.Chrome(chrome_options=chrome_options) I found it below: http://nullege.com/codes/show/src@o@s@[email protected]/56/selenium.webdriver.ChromeOptions.add_experimental_option