学习爬虫过程中解决下载网页乱码的问题

时间：2020-02-22 13:02:06 阅读：46 评论：0 收藏：0 [点我收藏+]

这个问题肯定是字符的编码错乱导致的。网上也有很多解决方案。我看过的方案很多，最好的就是这个了。

原因文章说得很清楚，理论也讲得明白。解决方案我录在下面。版权归原作者。

方法一：直接指定res.encoding

import requests

url = "http://search.51job.com"

res = requests.get(url)

res.encoding = "gbk"

html = res.text

print(html)

方法二：通过res.apparent_encoding属性指定

import requests

url = "http://search.51job.com"

res = requests.get(url)

res.encoding = res.apparent_encoding

html = res.text

print(html)

方法三：通过编码、解码的方式

import requests

url = "http://search.51job.com"

res = requests.get(url)

html = res.text.encode(‘iso-8859-1‘).decode(‘gbk‘)

print(html)

原文：https://www.cnblogs.com/xiaolee-tech/p/12344592.html

踩

(0)

评论一句话评论（0）

分享档案

更多>