chardet是一个非常优秀的编码识别模块, 是python的第三方库,需要下载和安装。
文档地址:https://pypi.org/project/chardet/
当然它不是所有的编码格式都能识别,具体可识别的编码格式参见文档。
pip install chardet
import chardet
rawdata = b‘sdfwe‘
res = chardet.detect(rawdata)
print(res)
输出:
{‘encoding‘: ‘ascii‘, ‘confidence‘: 1.0, ‘language‘: ‘‘}
chardet comes with a command-line script which reports on the encodings of one or more files:
% chardetect somefile someotherfile
somefile: windows-1252 with confidence 0.5
someotherfile: ascii with confidence 1.0
It means taking a sequence of bytes in an unknown character encoding, and attempting to determine the encoding so you can read the text. It’s like cracking a code when you don’t have the decryption key.
简单来说,就是从对象中选取一小部分,根据它的特征去猜编码格式。
原文:https://www.cnblogs.com/wodeboke-y/p/9938709.html