python开源项目Scrapy抓取文件乱码解决

时间：2015-04-17 13:24:48 阅读：1020 评论：0 收藏：0 [点我收藏+]

scrapy进行页面抓去的时候，保存的文件出现乱码，经过分析是编码的原因，只需要把编码转换为utf-8即可，代码片段

......

import chardet

......

content_type = chardet.detect(html_content)

#print(content_type[‘encoding‘])

if content_type[‘encoding‘] != "UTF-8":

html_content = html_content.decode(content_type[‘encoding‘])

html_content = html_content.encode("utf-8")

open(filename,"wb").write(html_content)

....

这样保存的文件就是中文了。

步骤:

先把gb2312的编码转换为unicode编码

然后在把unicode编码转换为utf-8.

原文：http://www.cnblogs.com/Byrd/p/4434463.html

踩

(0)

评论一句话评论（0）

分享档案

更多>