[Python]爬虫实例-深大新闻标题

时间：2020-06-09 18:00:39 阅读：52 评论：0 收藏：0 [点我收藏+]

　#单页

import requests
from lxml import etree

#1.页面获取
url = "https://www.szu.edu.cn/index/mtsd.htm"
response = requests.get(url)
response.encoding="utf-8"
wb_data = response.text
html = etree.HTML(wb_data)
#print(wb_data)

#2.数据定位
infos = html.xpath("//ul[@class=‘news-list‘]/li/a/text()")
for info in infos:
    print(info)

技术分享图片

#多页
import requests
from lxml import etree

#1.链接处理
urls = ["https://www.szu.edu.cn/index/mtsd/{}.htm".format(i)for i in range(1,46)]
urls.append("https://www.szu.edu.cn/index/mtsd.htm")
urls = urls[::-1]
#2.写入文件
f = open("深大爬虫.csv","w",encoding = "ANSI")
fileheader = ["标题"]
dict_writer = csv.DictWriter(f,fileheader)
dict_writer.writeheader()
#页面信息(翻页)
for url in urls:
    response = requests.get(url)
    response.encoding="utf-8"
    wb_data = response.text
    html = etree.HTML(wb_data)
    #3.数据定位 (每一页)
    infos = html.xpath("//ul[@class=‘news-list‘]/li/a/text()")
    for info in infos:
        dict_writer.writerow({"标题":info})
f.close()

技术分享图片

# UTF-8 改 ANSI，否则在excel中打开会乱码

[Python]爬虫实例-深大新闻标题

原文：https://www.cnblogs.com/Skybiubiu/p/13073997.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)