Scrapy-爬虫介绍

时间：2018-07-08 16:25:54 阅读：214 评论：0 收藏：0 [点我收藏+]

爬虫基本操作

　　1.应用

　　　　- 舆情系统：监听各大门户网站的热门词条、热门新闻，做进一步分析处理和展示

2.爬虫

　　- 定向

　　- 非定向

　　- 下载页面：

　　　　　　http://www.autohome.com.cn/news/

　　- 筛选：

　　　　　　正则表达式

　　======= 开源模块 =======

　　1.requests

　　　　pip3 install requests

　　　　response = requests.get(‘http://www.autohome.com.cn/news/‘)

　　　　response.text

　　2.beautifulsoup

　　　　pip3 install BeautifulSoup4

　　　　from bs4 import BeautifulSoup

　　　　soup = BeautiSoup(response.text,features=‘html.parser‘) #将html转换为对象，对象嵌套对象

　　　　target = soup.find(id=‘auto-channel-lazyload-article‘)

　　　　print(target)

爬虫并发方案

　　　　- 异步IO：gevent/Twisted/asyncio/aiohttp

　　　　- IO多路复用：select

Scrapy框架

　　　　- 异步IO：Twisted

原文：https://www.cnblogs.com/benchdog/p/9280051.html

踩

(0)

评论一句话评论（0）

分享档案

更多>