python中编码问题

时间：2019-03-23 18:50:32 阅读：168 评论：0 收藏：0 [点我收藏+]

python3代码执行过程:

解释器找到代码文件(文件以utf8/GBK..存储)，
把代码字符串按文件头定义的编码进行解码到内存，转成unicode
所有的变量字符都会以unicode编码声明(str的编码方式就是unicode)

unicode只在内存中进行显示, 传输和存储需要用到utf8/GBK.., 所以必须转成utf8/GBK..

str和bytes的区别就是编码方式的不同:

1 str(unicode编码)      ==>     bytes(utf8/GBK..)       ==>         存储, 传输
2 bytes = str.encode(‘utf-8‘)     # 编码
3 str = bytes.decode(‘utf-8‘)     # 解码

python3中str和bytes表现和编码:

 1 英文:
 2     str:    表现方式==>‘a‘
 3             编码方式==>0101      unicode
 4 
 5     bytes:  表现方式==>b‘a‘
 6             编码方式==>0101      utf8/GBK..
 7 
 8 
 9 中文:
10     str:    表现方式==>‘中‘
11             编码方式==>0101      unicode
12 
13     bytes:  表现方式==>b‘x\e9‘
14             编码方式==>0101      utf8/GBK..

在python2中:

u‘xxx‘为unicode对象, 就是python3中的str
bytes和str是同一个类型

 1 s = ‘a‘
 2 print (s, type(s))              # ‘a‘, <type ‘str‘>
 3 
 4 
 5 s = u‘中文‘
 6 print(s, type(s))               # u‘\u4e2d\u6587‘, <type ‘unicode‘>
 7 # 编码变成utf-8, 一个中文三个字节
 8 s1 = s.encode(‘utf-8‘)
 9 print(s1, type(s1))             # ‘\xe4\xb8\xad\xe6\x96\x87‘, <type ‘str‘>
10 
11 
12 # bytes和str是同一个类型
13 s1 = ‘a‘
14 s2 = bytes(‘a‘)
15 print(s1 is s2)                 # True

python中编码问题

原文：https://www.cnblogs.com/caihuajiaoshou/p/10585032.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)