scrapy-redis分布式爬虫,使用post方法

时间：2021-05-25 15:34:18 阅读：28 评论：0 收藏：0 [点我收藏+]

把源码中的 src 复制到自己项目中

from scrapy_redis.spiders import RedisSpider

class HmmSpider(RedisSpider):
    name = ‘spider_redis‘
    allowed_domains = [‘xxx1.com‘]
    redis_key = "redisqueue_online"

    custom_settings = {

        ‘SCHEDULER‘ : "scrapy_redis.scheduler.Scheduler",  # 启用Redis调度存储请求队列
        ‘SCHEDULER_PERSIST‘ : True,  # 不清除Redis队列、这样可以暂停/恢复 爬取
        ‘DUPEFILTER_CLASS‘ : "scrapy_redis.dupefilter.RFPDupeFilter",  # 确保所有的爬虫通过Redis去重
        ‘SCHEDULER_QUEUE_CLASS‘ : ‘scrapy_redis.queue.SpiderPriorityQueue‘,


        ‘REDIS_HOST‘: ‘111.131.124.111‘,
        ‘REDIS_PORT‘: ‘6379‘,
        ‘REDIS_ENCODING‘: ‘utf-8‘,
        ‘REDIS_PARAMS‘: {‘password‘: ‘1234‘}
    }

原文：https://blog.51cto.com/u_13888585/2811006

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)