scrapy - w3toppers.com

How to give URL to scrapy for crawling?

I’m not really sure about the commandline option. However, you could write your spider like this. class MySpider(BaseSpider): name=”my_spider” def __init__(self, *args, **kwargs): super(MySpider, self).__init__(*args, **kwargs) self.start_urls = [kwargs.get(‘start_url’)] And start it like: scrapy crawl my_spider -a start_url=”http://some_url”

Run a Scrapy spider in a Celery Task

The twisted reactor cannot be restarted. A work around for this is to let the celery task fork a new child process for each crawl you want to execute as proposed in the following post: Running Scrapy spiders in a Celery task This gets around the “reactor cannot be restart-able” issue by utilizing the multiprocessing … Read more