开博第一篇,学习下glibc的malloc实现,先记录下源码。
一、glibc的malloc移植了ptmalloc,ptmalloc封装了dlmalloc,关系如下:
dlmalloc(通用但并发不安全)--> ptmalloc(多线程优化)-->glibc移植(但是代码改动大)。
所以从本篇先从ptmalloc入手,分析改内存分配、管理的思想。
二、ptmalloc特性
* Vital statistics:
Supported pointer/size_t representation: 4 or 8 bytes
size_t MUST be an unsigned type of the same width as
pointers. (If you are using an ancient system that declares
size_t as a signed type, or need it to be a different width
than pointers, you can use a previous release of this malloc
(e.g. 2.7.2) supporting these.)
支持指针和size_t形式:4或8字节
size_t必须是无符号类型,且位宽和指针相同。(如果你使用有符号size_t的老系统,或者size_t
和指针位宽不一致,可以使用malloc的历史版本版本例如2.7.2支持)。
Alignment: 8 bytes (default)
This suffices for nearly all current machines and C compilers.
However, you can define MALLOC_ALIGNMENT to be wider than this
if necessary (up to 128bytes), at the expense of using more space.
对齐:默认8字节
它几乎支持当前所有的机器和c编译器。除此之外,如果需要,你可以通过定义MALLOC_ALIGNMENT宏,
更改为更大的对齐位宽(最大128字节),代价是会占用更大的内存空间。
Minimum overhead per allocated chunk: 4 or 8 bytes (if 4byte sizes)
8 or 16 bytes (if 8byte sizes)
Each malloced chunk has a hidden word of overhead holding size
and status information, and additional cross-check word
if FOOTERS is defined.
每个chunk块的最小前缀(overhead): 4或8字节(当size_t为4字节)
8或16字节(当size_t为8字节)
每个申请的chunk有一个隐藏的前缀字,用来报错块大小和状态信息,如果定义了FOOTERS,还会额外多
占用一个交叉检查字。
Minimum allocated size: 4-byte ptrs: 16 bytes (including overhead)
8-byte ptrs: 32 bytes (including overhead)
Even a request for zero bytes (i.e., malloc(0)) returns a
pointer to something of the minimum allocatable size.
The maximum overhead wastage (i.e., number of extra bytes
allocated than were requested in malloc) is less than or equal
to the minimum size, except for requests >= mmap_threshold that
are serviced via mmap(), where the worst case wastage is about
32 bytes plus the remainder from a system page (the minimal
mmap unit); typically 4096 or 8192 bytes.
最小申请内存大小(包括前缀):32bit地址:16字节
64bit地址:32字节
即使申请0字节(malloc(0)),实际也会返回一个最小内存指针。最大的前缀浪费(在malloc中
分配的超出请求的字节数)情况是申请少于或等于最小值。除此之外,当申请内存大与mmap_threshold
,将会通过mmap()函数分配内存,这样最大的浪费情况是32字节(mini)加剩余系统页(最小的mmap单元,
典型大小为4096字节或8192字节)。
Security: static-safe; optionally more or less
The "security" of malloc refers to the ability of malicious
code to accentuate the effects of errors (for example, freeing
space that is not currently malloc‘ed or overwriting past the
ends of chunks) in code that calls malloc. This malloc
guarantees not to modify any memory locations below the base of
heap, i.e., static variables, even in the presence of usage
errors. The routines additionally detect most improper frees
and reallocs. All this holds as long as the static bookkeeping
for malloc itself is not corrupted by some other means. This
is only one aspect of security -- these checks do not, and
cannot, detect all possible programming errors.
If FOOTERS is defined nonzero, then each allocated chunk
carries an additional check word to verify that it was malloced
from its space. These check words are the same within each
execution of a program using malloc, but differ across
executions, so externally crafted fake chunks cannot be
freed. This improves security by rejecting frees/reallocs that
could corrupt heap memory, in addition to the checks preventing
writes to statics that are always on. This may further improve
security at the expense of time and space overhead. (Note that
FOOTERS may also be worth using with MSPACES.)
By default detected errors cause the program to abort (calling
"abort()"). You can override this to instead proceed past
errors by defining PROCEED_ON_ERROR. In this case, a bad free
has no effect, and a malloc that encounters a bad address
caused by user overwrites will ignore the bad address by
dropping pointers and indices to all known memory. This may
be appropriate for programs that should continue if at all
possible in the face of programming errors, although they may
run out of memory because dropped memory is never reclaimed.
If you don‘t like either of these options, you can define
CORRUPTION_ERROR_ACTION and USAGE_ERROR_ACTION to do anything
else. And if if you are sure that your program using malloc has
no errors or vulnerabilities, you can define INSECURE to 1,
which might (or might not) provide a small performance improvement.
安全性:静态安全;可自行选择
Malloc的“安全性”是指代码中调用malloc的恶意代码造成的错误影响能力。(例如,释放的
内存不是当前申请的或者写溢出了当前的chunk尾部)。这个malloc保证不会修改堆底部以下
的任何内存位置,例如静态变量,即使存在使用错误。例程另外还检测大多数不正确的释放和
重分配。只要malloc本身的静态标记没有被其他程序破坏,所有这些都是成立的。这只是安全
性的一个方面——这些检查不会也不能检测所有可能的编程错误。
如果FOOTERS被定义为非0,每个分配的chunk块携带一个附加的检查字,以验证它是否被
Malloc。这些检查词在使用malloc的程序的每次执行中都是相同的,但是在不同的执行中是不
同的,因此不能释放外部构造的chunks。这可以通过拒绝可能损坏堆内存的释放/重新分配来提高安
全性,此外还可以防止写入始终处于打开状态的静态数据。这可能会以时间和空间开销为代价进一
步提高安全性。(注意,FOOTERS可能也值得与MSPACES一起使用。)
检测到默认错误的默认动作是调用abort(调用“abort()”)。你可以定义PROCEED_ON_ERROR来
覆盖它继续处理错误。在这种情况下,错误的free没有作用,而malloc遇到由用户覆盖引起的错误
地址时,会忽略错误地址,通过将指针和索引丢到所有已知内存中。这可能适用于那些在可能出现编程
错误的情况下应该继续运行的程序,尽管它们可能会耗尽内存,因为丢弃的内存永远不会被回收。
如果你不喜欢这些选项,你可以定义CORRUPTION_ERROR_ACTION和USAGE_ERROR_ACTION去做
另外的任何事。如果你确信你的程序使用malloc没有错误或漏洞,你可以将INSECURE置1,这样可能
会(也可能不会)有一个小的性能提高。
Thread-safety: NOT thread-safe unless USE_LOCKS defined
When USE_LOCKS is defined, each public call to malloc, free,
etc is surrounded with either a pthread mutex or a win32
spinlock (depending on WIN32). This is not especially fast, and
can be a major bottleneck. It is designed only to provide
minimal protection in concurrent environments, and to provide a
basis for extensions. If you are using malloc in a concurrent
program, consider instead using nedmalloc
(http://www.nedprod.com/programs/portable/nedmalloc/) or
ptmalloc (See http://www.malloc.de), which are derived
from versions of this malloc.
线程安全:不是线程安全的,除非定义USE_LOCKS
当USE_LOCKS被定义是,每个对malloc、free等的公共调用都被pthread互斥锁或win32自旋锁包住
(取决于win32)。这不是特别快的,会成为一个主要的瓶颈。在并发环境中,它设计只提供最少的保护,
为扩展的提供基础版本。如果你在并发环境中使用malloc,考虑使用nedmalloc或者ptmalloc,它们
是从这个malloc衍生出来的版本。
System requirements: Any combination of MORECORE and/or MMAP/MUNMAP
This malloc can use unix sbrk or any emulation (invoked using
the CALL_MORECORE macro) and/or mmap/munmap or any emulation
(invoked using CALL_MMAP/CALL_MUNMAP) to get and release system
memory. On most unix systems, it tends to work best if both
MORECORE and MMAP are enabled. On Win32, it uses emulations
based on VirtualAlloc. It also uses common C library functions
like memset.
Compliance: I believe it is compliant with the Single Unix Specification
(See http://www.unix.org). Also SVID/XPG, ANSI C, and probably
others as well.
* Overview of algorithms
This is not the fastest, most space-conserving, most portable, or
most tunable malloc ever written. However it is among the fastest
while also being among the most space-conserving, portable and
tunable. Consistent balance across these factors results in a good
general-purpose allocator for malloc-intensive programs.
In most ways, this malloc is a best-fit allocator. Generally, it
chooses the best-fitting existing chunk for a request, with ties
broken in approximately least-recently-used order. (This strategy
normally maintains low fragmentation.) However, for requests less
than 256bytes, it deviates from best-fit when there is not an
exactly fitting available chunk by preferring to use space adjacent
to that used for the previous small request, as well as by breaking
ties in approximately most-recently-used order. (These enhance
locality of series of small allocations.) And for very large requests
(>= 256Kb by default), it relies on system memory mapping
facilities, if supported. (This helps avoid carrying around and
possibly fragmenting memory used only for large chunks.)
All operations (except malloc_stats and mallinfo) have execution
times that are bounded by a constant factor of the number of bits in
a size_t, not counting any clearing in calloc or copying in realloc,
or actions surrounding MORECORE and MMAP that have times
proportional to the number of non-contiguous regions returned by
system allocation routines, which is often just 1. In real-time
applications, you can optionally suppress segment traversals using
NO_SEGMENT_TRAVERSAL, which assures bounded execution even when
system allocators return non-contiguous spaces, at the typical
expense of carrying around more memory and increased fragmentation.
The implementation is not very modular and seriously overuses
macros. Perhaps someday all C compilers will do as good a job
inlining modular code as can now be done by brute-force expansion,
but now, enough of them seem not to.
Some compilers issue a lot of warnings about code that is
dead/unreachable only on some platforms, and also about intentional
uses of negation on unsigned types. All known cases of each can be
ignored.
For a longer but out of date high-level description, see
http://gee.cs.oswego.edu/dl/html/malloc.html
算法简介:
这不是最快的、最节省空间的、最可移植的或最可调的malloc。然而,它是速度最快的,同时
也是最节省空间、可移植和可调的。在这些因素之间保持一致的平衡可以为malloc密集型程序
提供一个良好的通用分配程序。
在大多数情况下,这个malloc是最适合的分配器。通常,它会为请求选择最合适的现有chunk,
并以最近最少使用的顺序断开连接。(这种策略通常保持低碎片化。)但是,对于小于256bytes的
请求,当没有完全合适的可用块时,它就偏离了最佳匹配,因为它倾向于使用与前一个小请求相
邻的空间,并且打破了最近使用的顺序。(这增强了一系列小型内存申请的局部性。)对于非常大的请
求(缺省情况下>= 256Kb),如果支持,它依赖于系统内存映射工具。(这有助于避免携带仅用
于大块的内存,并可能使内存碎片化。)
//略
该实现不是很模块化,并且严重滥用了宏。也许有一天,所有的C编译器都能像现在通过蛮力扩展
所能做的那样,很好地内联模块化代码。
一些编译器会发出大量警告,指出代码只有在某些平台上是死的/不可到达的,还会故意在无符号
类型上使用求反。每种情况的所有已知情况都可以忽略。
原文:https://www.cnblogs.com/hankgo/p/12032677.html