【Python笔记】从一个“古怪”的case探究CPython对Int对象的实现细节

时间：2015-03-28 20:26:48 阅读：414 评论：0 收藏：0 [点我收藏+]

1. Python的对象模型
我们知道，在Python的世界里，万物皆对象（Object）。根据Python官方文档对Data Model的说明，每个Python对象均拥有3个特性：身份、类型和值。
官方文档关于对象模型的这段概括说明对于我们理解Python对象是如此重要，所以本文将其摘录如下（为了使得结构更清晰，这里把原文档做了分段处理）：
1) Every object has an identity, a type and a value.
2) An object‘s identity never changes once it has been created; you may think of it as the object‘s address in memory. The ‘is‘ operator compares the identity of two objects; the id() function returns an integer representing its identity (currently implemented as its address).
3) An object‘s type is also unchangeable. An object‘s type determines the operations that the object supports (e.g., "does it have a length?") and also defines the possible values for objects of that type. The type() function returns an object‘s type (which is an object itself).
4) The value of some objects can change. Objects whose value can change are said to be mutable; objects whose value is unchangeable once they are created are called immutable. (The value of an immutable container object that contains a reference to a mutable object can change when the latter‘s value is changed; however the container is still considered immutable, because the collection of objects it contains cannot be changed. So, immutability is not strictly the same as having an unchangeable value, it is more subtle.)
5) An object‘s mutability is determined by its type; for instance, numbers, strings and tuples are immutable, while dictionaries and lists are mutable.
总结一下：
1) 每个Python对象均有3个特性：身份、类型和值
2) 对象一旦创建，其身份（可以理解为对象的内存地址）就是不可变的。可以借助Python的built-in函数id()来获取对象的id值，可以借助is操作符来比较两个对象是否是同一个对象
3) 已创建对象的类型不可更改，对象的类型决定了可以作用于该对象上的操作，也决定了该对象可能支持的值
4) 某些对象（如list/dict）的value可以修改，这类对象被称为mutable object；而另一些对象（如numbers/strings/tuples）一旦创建，其value就不可修改，故被称为immutable object
5) 对象的值是否可以被修改是由其type决定的

2. 实例说明immutable object的值的“修改”行为
由上面的描述可知，数字类型的对象是不可变对象。为加深理解，考虑下面的示例代码。

>>> x = 2.11
>>> id(x)      
7223328
>>> x += 0.5
>>> x
2.61
>>> id(x)
7223376

上述代码中，x += 0.5看起来像是修改了名为x的对象的值。
但事实上，在Python底层实现中，x只是个指针，它指向对象的引用，也即x并不是一个数字类型的对象。
上述代码真正发生的事情是：
1) 值为2.11的float类型对象被创建，其引用计数值为1
2) x作为引用指向了刚才创建的对象，对象的引用计数值变为2
3) 当执行"x += 0.5"时，值为2.61的float类型对象被创建（其初始引用计数值为1），x作为引用指向了这个新对象（这意味着新对象的引用计数值变为2，而第1个对象的引用计数值由于x的"解引用"而减为1）
可见，上述代码并没有修改名为x的对象的值，标识符x只是通过重新引用指向了新创建的对象，让我们误以为其值被“修改”了而已。

3. 一个“古怪”的case
按照上述说明，下面的case如何理解呢？

>>> a = 20
>>> b = 20
>>> id(a)
7151888
>>> id(b)
7151888
>>> a is b
True

上述代码中，a和b应该是不同的对象的引用，它们的id值不相等才对。
但id(a) == id(b)及"a is b"输出"True"的事实表明，CPython解释器显然不是按照我们的预期来执行的。
难道是解释器实现有bug吗？

4. 从CPython实现PyIntObject的源码来揭秘
事实上，上面看到的不符合预期的古怪case与CPython实现PyIntObject类型时所作的优化有关。
《Python核心编程》一书第4.5.2节提到：
整数对象是不可变对象，所以Python会高效的缓存它，而这会造成我们认为Python应该创建新对象时，它却没有创建新对象的假象。
这正是我们刚才遇到的“古怪”case的底层原因。
为了证实这一点，我查看了CPython v2.7开源在github上的源码（cpython/Objects/intobject.c），可以看到下面一段代码：

#ifndef NSMALLPOSINTS
#define NSMALLPOSINTS   257
#endif
#ifndef NSMALLNEGINTS
#define NSMALLNEGINTS   5
#endif
#if NSMALLNEGINTS + NSMALLPOSINTS > 0
/* References to small integers are saved in this array so that they
   can be shared.
   The integers that are saved are those in the range
   -NSMALLNEGINTS (inclusive) to NSMALLPOSINTS (not inclusive).
*/
static PyIntObject *small_ints[NSMALLNEGINTS + NSMALLPOSINTS];
#endif

可见，解释器实现int型对象时，确实申请了一个small_ints数组用于缓存小整数，从宏定义及注释可以看到，缓存的整数范围是[-5, 257)。
在该源码文件中搜索"small_ints"还可以看到，该数组被4个函数用到，函数名分别为：_PyInt_Init, PyInt_FromLong, PyInt_ClearFreeList, PyInt_Fini
其中，后两个函数与资源释放相关，我们此处不关心；而在_PyInt_Init中，构造一系列small int对象并存入small_ints数组；在PyObject * PyInt_FromLong(long ival)函数中，若构造的是个small int（即传入的ival在small int范围内），则直接返回small_ints数组中的缓存对象，若传入的ival不在small int访问内，则创建新对象并返回其引用。

至此，我们大概清楚了CPython解释器实现int型对象的细节行为，也知道了我们遇到的那个古怪case的原因。
在交互模式下，我们已经看到CPython解释器确实会缓存小整数对象，事实上，CPython在编译py脚本时（编译成bytecodes），还会做其它与文档说明不符的优化，StackOverflow上的这篇帖子Weird Integer Cache inside Python 2.6对此做了详细说明，值得研读。
总之，解释器的实现细节我们无法干预，但是，在编写应用程序时，我们要确保函数逻辑不会依赖“解释器会缓存小整数”的这个特性，以免踩到诡异的坑。

【参考资料】
1. Python Doc: Data model
2. Section 4.5.2 of <Core Python Programming>，即《Python核心编程》第4.5.2节
3. Python Doc: Plain Integer Objects - PyInt_FromLong(long ival)
4. GitHub Repo - CPython Source Code: CPython/2.7/Objects/intobject.c
5. StackOverflow: Weird Integer Cache inside Python 2.6

===================== EOF =======================

【Python笔记】从一个“古怪”的case探究CPython对Int对象的实现细节

原文：http://blog.csdn.net/slvher/article/details/44704407

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)