Python爬虫随笔

时间：2020-05-18 23:58:41 阅读：112 评论：0 收藏：0 [点我收藏+]

1.网络数据采集的一个常用功能就是获取 HTML 表格并写入 CSV 文件。维基百科的文本编辑器对比词条（https://en.wikipedia.org/wiki/Comparison_of_text_editors）中用了许多复杂的 HTML 表格，用到了颜色、链接、排序，以及其他在写入 CSV 文件之前需要忽略的 HTML 元素。用 BeautifulSoup 和 get_text() 函数，你可以用十几行代码完成这件事：

 1 import csv
 2 from urllib.request import urlopen
 3 from bs4 import BeautifulSoup
 4 html = urlopen("http://en.wikipedia.org/wiki/Comparison_of_text_editors")
 5 bsObj = BeautifulSoup(html)
 6 # 主对比表格是当前页面上的第一个表格
 7 table = bsObj.findAll("table",{"class":"wikitable"})[0]
 8 rows = table.findAll("tr")
 9 csvFile = open("C:/Users/Administrator/Desktop/test2.csv", ‘wt‘, newline="", encoding=‘utf-8‘)
10 writer = csv.writer(csvFile)
11 try:
12      for row in rows:
13          csvRow = []
14          for cell in row.findAll([‘td‘, ‘th‘]):
15              csvRow.append(cell.get_text())
16              writer.writerow(csvRow)
17 finally:
18 
19     csvFile.close()

技术分享图片

Python爬虫随笔

原文：https://www.cnblogs.com/yeu4h3uh2/p/12913893.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)