In the new version (scrapy 1.1) launched 2016-05-11 the crawl first downloads robots.txt before crawling. To change this behavior change in your settings.py
with ROBOTSTXT_OBEY
ROBOTSTXT_OBEY = False
Here are the release notes
More Related Contents:
- How to run Scrapy from within a Python script
- Scrapy – Reactor not Restartable [duplicate]
- How to pass a user defined argument in scrapy spider
- Click a Button in Scrapy
- how to filter duplicate requests based on url in scrapy
- crawl site that has infinite scrolling using python
- How can I use different pipelines for different spiders in a single Scrapy project
- Locally run all of the spiders in Scrapy
- Creating a generic scrapy spider
- Scrapy CrawlSpider doesn’t crawl the first landing page
- find a word on a website and get its page link
- Python Google Web Crawler
- Sending “User-agent” using Requests library in Python
- Scraping dynamic content using python-Scrapy
- TypeError: can’t use a string pattern on a bytes-like object in re.findall()
- Scrapy Very Basic Example
- “OSError: [Errno 1] Operation not permitted” when installing Scrapy in OSX 10.11 (El Capitan) (System Integrity Protection)
- How to integrate Flask & Scrapy?
- Crawling with an authenticated session in Scrapy
- passing selenium response url to scrapy
- Scrapy image download how to use custom filename
- How to get the scrapy failure URLs?
- How can I get all the plain text from a website with Scrapy?
- how to handle 302 redirect in scrapy
- python: [Errno 10054] An existing connection was forcibly closed by the remote host
- How can i use multiple requests and pass items in between them in scrapy python
- Scrapy crawl from script always blocks script execution after scraping
- Force my scrapy spider to stop crawling
- Python Scrapy: Convert relative paths to absolute paths
- Get document DOCTYPE with BeautifulSoup