setting 常用配置

时间：2018-12-17 20:51:39 阅读：164 评论：0 收藏：0 [点我收藏+]

一，保存logging 信息

# 保存log信息的文件名
LOG_LEVEL = "INFO"
LOG_STDOUT = True
LOG_ENCODING = ‘utf-8‘
# 路径  os.path.dirname(os.path.dirname(os.path.dirname(__file__)))
LOG_FILE = os.path.dirname(__file__) + "/SHANGSHIYAOPINGMULU_error.log"

二，禁止重定向

REDIRECT_ENABLED = False

三，设置延时

import random
DOWNLOAD_DELAY = random.random() + random.random()
RANDOMIZE_DOWNLOAD_DELAY = True

四，设置USER_AGENT

USER_AGENT = ‘Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko‘

五，启动spider下面的全部爬虫

1.与spider同级目录中创建commands文件夹

mkdir commands

2.进入commands文件夹

cd commands

3.创建__init__.py文件

配置commands

COMMANDS_MODULE = ‘spider.commands‘

六，设置重新发请求的状态码

RETRY_HTTP_CODES = [500, 520]

七，配置redis

# reids连接信息
REDIS_HOST = "192.168.1.235"
REDIS_PORT = 6379
REDIS_PARAMS = {
    "password": "KangCe@0608",
}

# 1(必须). 使用了scrapy_redis的去重组件，在redis数据库里做去重
DUPEFILTER_CLASS = "scrapy_redis.dupefilter.RFPDupeFilter"

# 2(必须). 使用了scrapy_redis的调度器，在redis里分配请求
SCHEDULER = "scrapy_redis.scheduler.Scheduler"

# 3(必须). 在redis中保持scrapy-redis用到的各个队列，从而允许暂停和暂停后恢复，也就是不清理redis queues
SCHEDULER_PERSIST = True

# 4(必须). 通过配置RedisPipeline将item写入key为 spider.name : items 的redis的list中，供后面的分布式处理item
# 这个已经由 scrapy-redis 实现，不需要我们写代码，直接使用即可
ITEM_PIPELINES = {
    # ‘AQI.pipelines.AqiJsonPipeline‘: 200,
    # ‘AQI.pipelines.AqiCSVPipeline‘: 300,
    # ‘AQI.pipelines.AqiRedisPipeline‘: 400,
    # ‘AQI.pipelines.AqiMongoPipeline‘: 500,
    ‘scrapy_redis.pipelines.RedisPipeline‘: 100
}

setting 常用配置

原文：https://www.cnblogs.com/yoyo1216/p/10133703.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)