爬虫之selenium爬取斗鱼主播图片

时间：2019-10-26 17:17:37 阅读：91 评论：0 收藏：0 [点我收藏+]

这是我GitHub上简单的selenium介绍与简单使用：https://github.com/bwyt/spider/tree/master/selenium%E5%9F%BA%E7%A1%80

 1 """
 2 发送请求
 3     1.1生成driver对象
 4     2.1窗口最大化
 5     2.2下拉滚动条（保证每个位置都刷新）
 6     3.获取所有li标签列表
 7     遍历li标签列表提取图片的连接以及主播的名字
 8     保存图片
 9 翻页
10 """
11 import time
12 import requests
13 from selenium import webdriver
14 # 生成driver对象
15 driver = webdriver.Chrome()
16 # 先将窗口最大化
17 driver.maximize_window()
18 # 再到达指定路由
19 driver.get(‘https://www.douyu.com/g_hpjy‘)
20 while True:
21     time.sleep(2)
22     # 下拉滚动条（保证每个位置都刷新）
23     for i in range(2):
24         driver.execute_script(‘window.scrollTo(0,{})‘.format(i*500))
25         time.sleep(1)
26     # 获取所有图片的li标签列表
27     lis = driver.find_elements_by_xpath(‘//ul[@class="layout-Cover-list"]/li‘)
28     # 遍历li标签列表提取图片的连接以及主播的名字
29     for li in lis:
30         img_url = li.find_element_by_xpath(‘.//a[1]/div/div[1]/img‘).get_attribute(‘src‘)
31         peo_url = li.find_element_by_xpath(‘.//h2‘).text
32         # 保存图片
33         response = requests.get(img_url)
34         data = response.content
35         file = ‘images/‘ + peo_url + ‘.webp‘
36         with open(file, ‘wb‘) as f:
37             f.write(data)
38     try:
39         # 翻页
40         next_url = driver.find_element_by_xpath(‘//li[@class=" dy-Pagination-next"]‘).click()
41     except Exception as e:
42         print(e)
43         break
44 time.sleep(5)
45 driver.close()

爬虫之selenium爬取斗鱼主播图片

原文：https://www.cnblogs.com/zry-yt/p/11743818.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)