01、博客爬虫

时间：2019-04-11 21:04:06 阅读：256 评论：0 收藏：0 [点我收藏+]

你需要爬取的是博客【人人都是蜘蛛侠】中，《未来已来（四）——Python学习进阶图谱》的所有文章评论，并且打印。

文章URL:https://wordpress-edu-3autumn.localprod.forc.work/all-about-the-future_04/

 1 #1、博客爬虫
 2 #    你需要爬取的是博客【人人都是蜘蛛侠】中，《未来已来（四）——Python学习进阶图谱》的所有文章评论，并且打印。
 3 #    文章URL:https://wordpress-edu-3autumn.localprod.forc.work/all-about-the-future_04/
 4 import requests
 5 from bs4 import BeautifulSoup
 6 res = requests.get(‘https://wordpress-edu-3autumn.localprod.forc.work/all-about-the-future_04/‘)
 7 html = res.text
 8 soup = BeautifulSoup(html,‘html.parser‘)
 9 items = soup.find_all(‘div‘,class_=‘comment-content‘)
10 for item in items:
11     print(item.find(‘p‘).text)
12 
13 ‘‘‘
14 执行结果如下：
15     测试评论
16     我们就是
17     minu
18     kpi
19 ‘‘‘
20 
21 ‘‘‘
22 #   下面是老师的代码
23 
24 #   调用requests库
25 import requests
26 #   调用BeautifulSoup库
27 from bs4 import BeautifulSoup
28 #   把网址复制给变量destnation_url
29 url_destnation = ‘https://wordpress-edu-3autumn.localprod.forc.work/all-about-the-future_04/‘
30 #   返回一个response对象，赋值给destnation
31 res_comment = requests.get (url_destnation)
32 #   把网页解析为BeautifulSoup对象
33 bs_comment = BeautifulSoup(res_comment.text,‘html.parser‘)
34 #   通过匹配属性提取出我们想要的元素
35 list_comments = bs_comment.find_all(‘div‘,class_= ‘comment-content‘)
36 #   遍历列表，取出列表中的每一个值
37 for tag_comment in list_comments:
38 #   打印评论的文本
39     print(tag_comment.text)
40 ‘‘‘

items中每个Tag的内容如下

1 <div class="comment-content">
2 <p>第1个蜘蛛侠</p>
3 </div>

01、博客爬虫

原文：https://www.cnblogs.com/www1707/p/10692298.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)