一、安装
pip3 install -i https://pypi.douban.com/simple scrapy-redis
二、配置文件
scrapy 去重
DUPEFILTER_KEY = ‘dupefilter:%(timestamp)s‘ DUPEFILTER_CLASS = ‘scrapy_redis.dupefilter.RFPDupeFilter‘
scrapy连接redis
REDIS_HOST = ‘ip‘ REDIS_PORT = 端口号 REDIS_PARAMS = {‘password‘:‘密码‘} REDIS_ENCODING = "utf-8" 或 # REDIS_URL = ‘redis://user:密码@ip:端口‘ (优先于以上配置)
三、自定义类
通过继承RFPDupeFilter和重写from_settings方法,设置默认的key
class RedisDupeFilter(RFPDupeFilter): @classmethod def from_settings(cls, settings): server = get_redis_from_settings(settings) key = defaults.DUPEFILTER_KEY % {‘timestamp‘: ‘固定的key‘‘} debug = settings.getbool(‘DUPEFILTER_DEBUG‘) return cls(server, key=key, debug=debug)
配置文件修改DUPEFILTER_CLASS的路径即可
原文:https://www.cnblogs.com/wt7018/p/11756393.html