关于Python3爬虫抓取网页Unicode

时间：2016-08-04 14:51:23 阅读：199 评论：0 收藏：0 [点我收藏+]

import urllib.request response = urllib.request.urlopen(‘http://www.baidu.com‘) html = response.read() print(html)

上面的代码正常但是运行的时候结果遇到中文会以\xe7\x99\xbe\xe5\xba\xa6\xe4\xb8\x80代替，这是一种byte字节。

python3 输出位串，而不是可读的字符串，需要对其进行转换

使用str(string[, encoding])对数组进行转换

str(response.read(),‘utf-8‘)

import urllib.request

response = urllib.request.urlopen(‘http://www.baidu.com‘)

html =str(response.read(),‘utf-8‘)

print(html)

OK！

原文：http://www.cnblogs.com/hlssz/p/5736533.html

踩

(0)

评论一句话评论（0）

分享档案

更多>