HashMap的那些事

时间：2020-07-21 12:13:30 阅读：66 评论：0 收藏：0 [点我收藏+]

　　在java程序中，Map的实现类HashMap在日常编码中是经常用到的，那么其里面是什么样的呢？

　　首先从数据结构上来说，HashMap的实现是数组+链表+红黑树（jdk>1.7）。

　　首先看看里面的属性参数

　　/**
     * The default initial capacity - MUST be a power of two.
     */
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

　　看上面的注释：他默认的初始容量-必须是2的幂次方。（初始数组默认长度为16。）

　　/**
     * The maximum capacity, used if a higher value is implicitly specified
     * by either of the constructors with arguments.
     * MUST be a power of two <= 1<<30.
     */
    static final int MAXIMUM_CAPACITY = 1 << 30;

　　看上面的注释：最大容量，在隐式指定更高的值时使用，由具有参数的构造函数之一。必须是2的幂<=1<<30。(HashMap的最大容量)

　　/**
     * The load factor used when none specified in constructor.
     */
    static final float DEFAULT_LOAD_FACTOR = 0.75f;

　　看上面的注释：构造函数中未指定时使用的负载因子。（默认的扩容因子为0.75。）

　　/**
     * The bin count threshold for using a tree rather than list for a
     * bin.  Bins are converted to trees when adding an element to a
     * bin with at least this many nodes. The value must be greater
     * than 2 and should be at least 8 to mesh with assumptions in
     * tree removal about conversion back to plain bins upon
     * shrinkage.
     */
    static final int TREEIFY_THRESHOLD = 8;

　　看上面的注释：使用树而不是列表的容器计数阈值箱子。将元素添加到至少有这么多节点。值必须更大大于2，且应至少为8，以便与树移除关于转换回普通垃圾箱收缩。

　　链表转红黑树链表长度为8（待定）。

　　/**
     * The bin count threshold for untreeifying a (split) bin during a
     * resize operation. Should be less than TREEIFY_THRESHOLD, and at
     * most 6 to mesh with shrinkage detection under removal.
     */
    static final int UNTREEIFY_THRESHOLD = 6;

　　看上面的注释：在调整大小操作。应小于TREEIFY_THRESHOLD，并且大部分6到网目下进行收缩检测。

　　红黑树转链表数值，小于6时由树转为链表。

　　/**
     * The smallest table capacity for which bins may be treeified.
     * (Otherwise the table is resized if too many nodes in a bin.)
     * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
     * between resizing and treeification thresholds.
     */
    static final int MIN_TREEIFY_CAPACITY = 64;

　　看上面的注释：可对箱子进行树型化的最小表容量。（否则，如果bin中的节点太多，则调整表的大小。）应至少为4*TREEIFY_THRESHOLD ，以避免冲突，在调整大小和树化阈值之间

　　链表转红黑树数组最小长度（等会解释）

static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;

        Node(int hash, K key, V value, Node<K,V> next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }

        public final K getKey()        { return key; }
        public final V getValue()      { return value; }
        public final String toString() { return key + "=" + value; }

        public final int hashCode() {
            return Objects.hashCode(key) ^ Objects.hashCode(value);
        }

        public final V setValue(V newValue) {
            V oldValue = value;
            value = newValue;
            return oldValue;
        }

        public final boolean equals(Object o) {
            if (o == this)
                return true;
            if (o instanceof Map.Entry) {
                Map.Entry<?,?> e = (Map.Entry<?,?>)o;
                if (Objects.equals(key, e.getKey()) &&
                    Objects.equals(value, e.getValue()))
                    return true;
            }
            return false;
        }
    }

　　内部类，Node节点。实现了Map接口的内部接口Entry。里面有4个属性

　　final int hash;　　//key的hashcode
　　final K key;　　　　//key值
　　V value;　　　　　　//value值
　　Node<K,V> next;　　//链表指向的下一个节点

　　里面的方法都是实现了Entry接口的方法，里面还有一些Entry自带的比较方法。

　　/**
     * The table, initialized on first use, and resized as
     * necessary. When allocated, length is always a power of two.
     * (We also tolerate length zero in some operations to allow
     * bootstrapping mechanics that are currently not needed.)
     */
    transient Node<K,V>[] table;

　　HashMap的节点数组。数组结构中的数组。(后面都用table代替)

　　/**
     * The number of key-value mappings contained in this map.
     */
    transient int size;

　　HashMap中key-value的数量。

/**
     * The next size value at which to resize (capacity * load factor).
     *
     * @serial
     */
    // (The javadoc description is true upon serialization.
    // Additionally, if the table array has not been allocated, this
    // field holds the initial array capacity, or zero signifying
    // DEFAULT_INITIAL_CAPACITY.)
    int threshold;

　　HashMap中size达到这个数会进行table扩容。

 　　/**
     * The number of times this HashMap has been structurally modified
     * Structural modifications are those that change the number of mappings in
     * the HashMap or otherwise modify its internal structure (e.g.,
     * rehash).  This field is used to make iterators on Collection-views of
     * the HashMap fail-fast.  (See ConcurrentModificationException).
     */
    transient int modCount;

　　table被更改次数。

　　下面来看构造函数：

　　1.先看默认的构造函数

    /**
     * Constructs an empty <tt>HashMap</tt> with the default initial capacity
     * (16) and the default load factor (0.75).
     */
    public HashMap() {
        this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
    }

　　把负载因子赋值为默认的0.75，然后就什么都没做了。

    /**
     * Constructs an empty <tt>HashMap</tt> with the specified initial
     * capacity and the default load factor (0.75).
     *
     * @param  initialCapacity the initial capacity.
     * @throws IllegalArgumentException if the initial capacity is negative.
     */
    public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }

　　初始化table长度为构造函数中的长度。负载因子用的是默认的负载因子0.75。下面的构造函数为指定扩容因子构造函数，上面的构造函数调用的就是下面的构造函数。我们来看下下面的构造函数干了啥。

    /**
     * Constructs an empty <tt>HashMap</tt> with the specified initial
     * capacity and load factor.
     *
     * @param  initialCapacity the initial capacity
     * @param  loadFactor      the load factor
     * @throws IllegalArgumentException if the initial capacity is negative
     *         or the load factor is nonpositive
     */
    public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)        //判断传入的table长度是否小于0，如果小于0抛出异常：错误的容量
            throw new IllegalArgumentException("Illegal initial capacity: " +
                    initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)     //判断是否大于HashMap的最大容量，如果大于最大容量，就把initialCapacity为MAXIMUM_CAPACITY(1 << 30)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +    //判断负载因子是否小于0，如果小于0抛出异常
                    loadFactor);
        this.loadFactor = loadFactor;               //负载因子赋值为传入的数值
        this.threshold = tableSizeFor(initialCapacity);     //赋值为最接近2的n次幂的table长度（看方法注释）
    }

　　下面的构造函数为传入一个Map实现类。

/**
     * Constructs a new <tt>HashMap</tt> with the same mappings as the
     * specified <tt>Map</tt>.  The <tt>HashMap</tt> is created with
     * default load factor (0.75) and an initial capacity sufficient to
     * hold the mappings in the specified <tt>Map</tt>.
     *
     * @param   m the map whose mappings are to be placed in this map
     * @throws  NullPointerException if the specified map is null
     */
    public HashMap(Map<? extends K, ? extends V> m) {
        this.loadFactor = DEFAULT_LOAD_FACTOR;      //初始化扩容因子为默认的0.75
        putMapEntries(m, false);
    }

    /**
     * Implements Map.putAll and Map constructor
     *
     * @param m the map
     * @param evict false when initially constructing this map, else
     * true (relayed to method afterNodeInsertion).
     */
    final void putMapEntries(Map<? extends K, ? extends V> m, boolean evict) {
        int s = m.size();
        if (s > 0) {            //如果Map的size>0，执行以下程序
            if (table == null) { // pre-size        //如果这个时候table还未初始化
                float ft = ((float)s / loadFactor) + 1.0F;
                int t = ((ft < (float)MAXIMUM_CAPACITY) ?
                        (int)ft : MAXIMUM_CAPACITY);        //计算得出一个容量向上取整，如果大于最大容量，就赋值为最大容量，小于就截取
                if (t > threshold)
                    threshold = tableSizeFor(t);    //如果超过了threshold（此时为0），则取t最近的2的n次幂所得的长度赋值给threshold
            }
            else if (s > threshold)
                resize();
            for (Map.Entry<? extends K, ? extends V> e : m.entrySet()) {
                K key = e.getKey();
                V value = e.getValue();
                putVal(hash(key), key, value, false, evict);        //往table中添加node节点，此时evict为false，每次添加node都是新增节点(后面会讲到更改节点的情况)实际作用不大
            }
        }
    }

　　构造方法先到这里，里面的一些问题下面会讲到。接下来我们来看put干了些啥事情。

public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);      //调用了hashcode方法计算key的hashcode值，在当作参数传入putVal中。
    }

//再看hash方法
static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);       //如果key为null，获取的hashcode为0，如果不是,扰动函数计算key的hashcode，减少hashcode碰撞的几率
    }

//再看putVal方法
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;     //tab：临时存放table；p：临时存放key所计算出的hashcode位运算后所得下标i元素的Node对象；n为table的长度；i为Node在table中下标
        if ((tab = table) == null || (n = tab.length) == 0)     //判断是不是table是否初始化，如果没有初始化，调用扩容方法，返回一个默认长度为16，扩容因子为0.75的table
            n = (tab = resize()).length;    //如果table未初始化，n赋值为tab的长度，也就是默认的16
        if ((p = tab[i = (n - 1) & hash]) == null)      //(n - 1) & hash位运算计算出key的hashcode值所放下标应该是多少，赋值给i
            tab[i] = newNode(hash, key, value, null);       //这个时候取tab【i】下标元素赋值一个新的Node；    下面else不走
        else {
            //////////////          这里是根据key的hashcode取下标元素，这时候元素已经存在table中的情况       //////////
            Node<K,V> e; K k;           //e：临时Node；  k：p（下标为i的Node）的key
            if (p.hash == hash &&
                    ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;                          //这种情况，table中下标为i的元素的key和传入key相同，hashcode计算相同。其余逻辑不走，然后在653行执行更改value操作（覆盖相同key的value，hashMap的特性）
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);     //这种情况下是下标为i的元素为红黑树结构了，往红黑树添加，并且平衡树，平衡完成后返回null;所以不更改原有下标元素
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {     //把e赋值为p的下一个Node
                        p.next = newNode(hash, key, value, null);           //（这个情况是p Node这个时候还未链表化）如果p的下一个Node为空，创建找寻到原有Node p 的next Node     由此得出重要结论（链表转树，TREEIFY_THRESHOLD是8，但是链表长度不一定是8，而且肯定比8要大，因为下一个链表节点已经创建，下面链表转树方法也将说明，链表转树，链表长度不一定为8）
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);          //尝试链表转树，然后跳出循环
                        break;
                    }
                    if (e.hash == hash &&
                            ((k = e.key) == key || (key != null && key.equals(k))))     //Node p存在next Node（这个情况是p Node这个时候已经链表化）； Node e的hash和key都为传入的key的hashcode一致，并且e的key和传入的key一致，直接跳出循环
                        break;
                    p = e;      //（这个时候p已经链表化）上面两种情况都不满足的话，把 p Node指向下一个Node e，继续循环，然后就会走第二个if情况，跳出循环，链表上层和下次互换位置
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;                     //table被修改次数+1
        if (++size > threshold)         //如果达到了扩容量，进行扩容
            resize();
        afterNodeInsertion(evict);
        return null;
    }

下面附treeifyBin方法

final void treeifyBin(Node<K,V>[] tab, int hash) {      //链表转树结构
        int n, index; Node<K,V> e;
        if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)     //如果长度达不到MIN_TREEIFY_CAPACITY（64），不会进行转树，会先进性扩容
            resize();
        else if ((e = tab[index = (n - 1) & hash]) != null) {       //key的hashcode经过位运行得到得table下标i不为空得话，进行链表转为树结构
            TreeNode<K,V> hd = null, tl = null;
            do {
                TreeNode<K,V> p = replacementTreeNode(e, null);
                if (tl == null)
                    hd = p;
                else {
                    p.prev = tl;
                    tl.next = p;
                }
                tl = p;
            } while ((e = e.next) != null);
            if ((tab[index] = hd) != null)
                hd.treeify(tab);
        }
    }

问题1：为什么传入初始table长度，HashMap会直接转成最靠近2的n次幂数字呢？

　　原因看上面方法putVal中的根据key的hashcode获取具体下标元素时：

　　tab[i = (n - 1) & hash]　　这个时候位运算的结果相当于hashcode%table.length ，如果不是2的n次幂，那么这个将不成立，位运算在hashMap的size相当大的时候，

　　位运算得出下标要比取模得出下标效率要高出太多。所以，HashMap的table长度必须为2的n次幂，如果不是，会被强制转为2的n次幂。


问题2：扩容因子为什么是0.75？有必要去更改扩容因子吗？
　　这个时候我们可以回过头看HashMap的注释：

 * <p>As a general rule, the default load factor (.75) offers a good
 * tradeoff between time and space costs.  Higher values decrease the
 * space overhead but increase the lookup cost (reflected in most of
 * the operations of the <tt>HashMap</tt> class, including
 * <tt>get</tt> and <tt>put</tt>).  The expected number of entries in
 * the map and its load factor should be taken into account when
 * setting its initial capacity, so as to minimize the number of
 * rehash operations.  If the initial capacity is greater than the
 * maximum number of entries divided by the load factor, no rehash
 * operations will ever occur.

　　从上面可以看出，HashMap给出的扩容因子建议就是0.75，那么0.75是怎么来的呢？为什么是0.75，不是其他的呢？

　　首先，我们从时间复杂度上面来考虑，我可以把扩容因子设置为0.5，这样hash碰撞的概率就小了很多，达到长度tabe就扩容找寻元素效率大大提升（缺点：空间利用率变低，每次扩容都还有一半空间未使用，空间换时间）。

　　然后从空间复杂度上面来看，如果我设置为1的话，当table元素满了我才去扩容，这样空间利用率变高了。（缺点：hash碰撞概率提升，找寻元素效率降低，时间换空间）。

　　从空间复杂度和时间复杂度上面看，取0.75是个比较折中的数字。但是，并不是这么回事，0.5和1之间，根据牛顿二项式推导出的结论为0.698...，HashMap开发人员取折中选为0.75。

　　问题3：链表在什么时候会转为红黑树？

　　答案在上面源码中贴出了，链表长度在达到8的时候会去尝试由链表转为树结构，但是是先创建了next Node，所以，链表此时已经不是8的长度了，而且，如果数组长度达不到MIN_TREEIFY_CAPACITY(64)，尝试转树结构只会给数组扩容，并不会去实际转树结构，参照方法treeifyBin，和putVal中来看：所以链表在转树的情况下，有两个必要条件（链表长度>=8&&table.length>=64）

　　问题4：为什么链表长度达到8的时候会去尝试转为红黑树呢？

　　我们可以看看HashMap的注释：

　　　* Because TreeNodes are about twice the size of regular nodes, we
     * use them only when bins contain enough nodes to warrant use
     * (see TREEIFY_THRESHOLD). And when they become too small (due to
     * removal or resizing) they are converted back to plain bins.  In
     * usages with well-distributed user hashCodes, tree bins are
     * rarely used.  Ideally, under random hashCodes, the frequency of
     * nodes in bins follows a Poisson distribution
     * (http://en.wikipedia.org/wiki/Poisson_distribution) with a
     * parameter of about 0.5 on average for the default resizing
     * threshold of 0.75, although with a large variance because of
     * resizing granularity. Ignoring variance, the expected
     * occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
     * factorial(k)). The first values are:
     *
     * 0:    0.60653066
     * 1:    0.30326533
     * 2:    0.07581633
     * 3:    0.01263606
     * 4:    0.00157952
     * 5:    0.00015795
     * 6:    0.00001316
     * 7:    0.00000094
     * 8:    0.00000006
     * more: less than 1 in ten million

上图的0,1,2,3,4,5,6,7,8为链表长度概率（根据泊松分布概率计算得出），从上面可以看出，当链表长度为8的时候，已经是亿分之6的概率了，已经无限趋近于0，可以忽略不计；

(exp(-0.5) * pow(0.5, k) /factorial(k)　　计算得出为8

HashMap的那些事

原文：https://www.cnblogs.com/ghsy/p/13352708.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)