Python 爬虫

时间：2017-07-24 09:22:58 阅读：232 评论：0 收藏：0 [点我收藏+]

1.用Requests爬去你想要的爬取的网站

import requests
 
r = requests.get(‘https://www.baidu.com‘)
print r.text # 打印网站源代码

注意：使用Requests前需要安装Requests库，安装方法，命令行输入：

1	`pip` `install` `requests`

2. 用Beautiful Soup解析网站源代码

安装：

1	`pip` `install` `beautifulsoup4`

解析：

from bs4 import BeautifulSoup # 引用BeautifulSoup库
import requests               # 引用Requests库
 
r = requests.get(‘https://www.baidu.com‘)
html = r.text                 # 获取网站源代码
soup = BeautifulSoup(html)    #创建 beautifulsoup 对象
print soup.a                  # 获取网页的链接 （a标签）
# ...

PS：

Requests库的具体用法，见 http://docs.python-requests.org/zh_CN/latest/

BeautifulSoup库的具体用法，见 https://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/

Python 爬虫

原文：http://www.cnblogs.com/siyu1915/p/7226951.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)