python爬虫学习（一）保存页面+字符解码

时间：2021-06-22 11:25:40 阅读：14 评论：0 收藏：0 [点我收藏+]

from urllib.request import urlopen
#打开网址，得到一个响应，利用python自带的urlopen
url = "http://www.baidu.com"
resp = urlopen(url)
result = resp.read()

技术分享图片

from urllib.request import urlopen
#打开网址，得到一个响应
url = "http://www.baidu.com"
resp = urlopen(url)
#获取内容read,字节b转字符串通过decode(),可以正常输出中文
result = resp.read().decode("utf-8")

with open("mybaidu.html",mode="w",encoding=‘utf-8‘) as f:
    f.write(result)

此时的输出为编码格式b,需要解码decode("utf-8")

技术分享图片

抓包工具

请求头中

User-Agent:里面放的是客户机的信息，浏览器信息

Referer：防盗链

Cookie：本地字符串数据信息，（用于反爬）

响应头中

Cookie：本地字符串数据信息，（用于反爬）

以及其他一些东西

python爬虫学习（一）保存页面+字符解码

原文：https://www.cnblogs.com/YuyuFishSmile/p/14917067.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)