首页 > 其他 > 详细

Scrapy

时间:2018-09-27 21:07:59      阅读:190      评论:0      收藏:0      [点我收藏+]
依赖关系
  pip install wheel
  pip install Twisted xxxxxxxx.whl
  pip install pywin32
  pip install scrapy


#创建project
scrapy startproject pro_name
cd pro_name

#创建爬虫
scrapy genspider chouti chouti.com
scrapy genspider cnblogs cnblogs.com


#启动爬虫
scrapy crawl chouti







1.创建project
  scrapy startproject 项目名称
    会自动生成几个文件
    项目名称
      项目名称/
        - spiders(spiders folder /__init__.py)     #存放爬虫文件
        - items.py       #持久化
        - pipelines      #持久化
        - middlewares.py   #中间件
        - settings.py     #配置文件 (爬虫)
      scrapy.cfg         #配置文件 (部署)

2.创建爬虫
  cd 项目名称
  scrapy genspider chouti chouti.com
  scrapy genspider cnblogs cnblogs.com

3.启动爬虫
  scrapy crawl chouti
  

  

命令行
  

Available commands:
    bench    Run quick benchmark test
    check    Check spider contracts
    crawl    Run a spider
    edit     Edit spider
    fetch    Fetch a URL using the Scrapy downloader
    genspider Generate new spider using pre-defined templates
    list     List available spiders
    parse    Parse URL (using its spider) and print the results
    runspider  Run a self-contained spider (without creating a project)
    settings   Get settings values
    shell    Interactive scraping console
    startproject Create new project
    version   Print Scrapy version
    view     Open URL in browser, as seen by Scrapy

Use "scrapy <command> -h" to see more info about a command




编码问题
import sys,os,io
sys.stdout=io.TextIOWrapper(sys.stdout.buffer,encoding=‘gb18030‘)









 

Scrapy

原文:https://www.cnblogs.com/yanxiatingyu/p/9715432.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!