首页 > 其他 > 详细

HashMap的那些事

时间:2020-07-21 12:13:30      阅读:66      评论:0      收藏:0      [点我收藏+]

  在java程序中,Map的实现类HashMap在日常编码中是经常用到的,那么其里面是什么样的呢?

 

  首先从数据结构上来说,HashMap的实现是数组+链表+红黑树(jdk>1.7)。

 

  首先看看里面的属性参数

  /**
     * The default initial capacity - MUST be a power of two.
     */
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

  看上面的注释:他默认的初始容量-必须是2的幂次方。(初始数组默认长度为16。)

  

  /**
     * The maximum capacity, used if a higher value is implicitly specified
     * by either of the constructors with arguments.
     * MUST be a power of two <= 1<<30.
     */
    static final int MAXIMUM_CAPACITY = 1 << 30;

  看上面的注释:最大容量,在隐式指定更高的值时使用,由具有参数的构造函数之一。必须是2的幂<=1<<30。(HashMap的最大容量)

  /**
     * The load factor used when none specified in constructor.
     */
    static final float DEFAULT_LOAD_FACTOR = 0.75f;

  看上面的注释:构造函数中未指定时使用的负载因子。(默认的扩容因子为0.75。)

 

  /**
     * The bin count threshold for using a tree rather than list for a
     * bin.  Bins are converted to trees when adding an element to a
     * bin with at least this many nodes. The value must be greater
     * than 2 and should be at least 8 to mesh with assumptions in
     * tree removal about conversion back to plain bins upon
     * shrinkage.
     */
    static final int TREEIFY_THRESHOLD = 8;

  看上面的注释:使用树而不是列表的容器计数阈值箱子。将元素添加到至少有这么多节点。值必须更大大于2,且应至少为8,以便与树移除关于转换回普通垃圾箱收缩。

  链表转红黑树链表长度为8(待定)。

  /**
     * The bin count threshold for untreeifying a (split) bin during a
     * resize operation. Should be less than TREEIFY_THRESHOLD, and at
     * most 6 to mesh with shrinkage detection under removal.
     */
    static final int UNTREEIFY_THRESHOLD = 6;

  看上面的注释:在调整大小操作。应小于TREEIFY_THRESHOLD,并且大部分6到网目下进行收缩检测。

  红黑树转链表数值,小于6时由树转为链表。

  /**
     * The smallest table capacity for which bins may be treeified.
     * (Otherwise the table is resized if too many nodes in a bin.)
     * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
     * between resizing and treeification thresholds.
     */
    static final int MIN_TREEIFY_CAPACITY = 64;

  看上面的注释:可对箱子进行树型化的最小表容量。(否则,如果bin中的节点太多,则调整表的大小。)应至少为4*TREEIFY_THRESHOLD ,以避免冲突,在调整大小和树化阈值之间

  链表转红黑树数组最小长度(等会解释)

 

static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;

        Node(int hash, K key, V value, Node<K,V> next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }

        public final K getKey()        { return key; }
        public final V getValue()      { return value; }
        public final String toString() { return key + "=" + value; }

        public final int hashCode() {
            return Objects.hashCode(key) ^ Objects.hashCode(value);
        }

        public final V setValue(V newValue) {
            V oldValue = value;
            value = newValue;
            return oldValue;
        }

        public final boolean equals(Object o) {
            if (o == this)
                return true;
            if (o instanceof Map.Entry) {
                Map.Entry<?,?> e = (Map.Entry<?,?>)o;
                if (Objects.equals(key, e.getKey()) &&
                    Objects.equals(value, e.getValue()))
                    return true;
            }
            return false;
        }
    }

  内部类,Node节点。实现了Map接口的内部接口Entry。里面有4个属性

  final int hash;  //key的hashcode
  final K key;    //key值
  V value;      //value值
  Node<K,V> next;  //链表指向的下一个节点

  里面的方法都是实现了Entry接口的方法,里面还有一些Entry自带的比较方法。

  /**
     * The table, initialized on first use, and resized as
     * necessary. When allocated, length is always a power of two.
     * (We also tolerate length zero in some operations to allow
     * bootstrapping mechanics that are currently not needed.)
     */
    transient Node<K,V>[] table;

  HashMap的节点数组。数组结构中的数组。(后面都用table代替)


  /**
     * The number of key-value mappings contained in this map.
     */
    transient int size;

  HashMap中key-value的数量。

    

/**
     * The next size value at which to resize (capacity * load factor).
     *
     * @serial
     */
    // (The javadoc description is true upon serialization.
    // Additionally, if the table array has not been allocated, this
    // field holds the initial array capacity, or zero signifying
    // DEFAULT_INITIAL_CAPACITY.)
    int threshold;

  HashMap中size达到这个数会进行table扩容。

 

   /**
     * The number of times this HashMap has been structurally modified
     * Structural modifications are those that change the number of mappings in
     * the HashMap or otherwise modify its internal structure (e.g.,
     * rehash).  This field is used to make iterators on Collection-views of
     * the HashMap fail-fast.  (See ConcurrentModificationException).
     */
    transient int modCount;

  table被更改次数。

 

  下面来看构造函数:

  1.先看默认的构造函数

 
    /**
     * Constructs an empty <tt>HashMap</tt> with the default initial capacity
     * (16) and the default load factor (0.75).
     */
    public HashMap() {
        this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
    }

  把负载因子赋值为默认的0.75,然后就什么都没做了。

 

    /**
     * Constructs an empty <tt>HashMap</tt> with the specified initial
     * capacity and the default load factor (0.75).
     *
     * @param  initialCapacity the initial capacity.
     * @throws IllegalArgumentException if the initial capacity is negative.
     */
    public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }

  初始化table长度为构造函数中的长度。负载因子用的是默认的负载因子0.75。下面的构造函数为指定扩容因子构造函数,上面的构造函数调用的就是下面的构造函数。我们来看下下面的构造函数干了啥。

  

    /**
     * Constructs an empty <tt>HashMap</tt> with the specified initial
     * capacity and load factor.
     *
     * @param  initialCapacity the initial capacity
     * @param  loadFactor      the load factor
     * @throws IllegalArgumentException if the initial capacity is negative
     *         or the load factor is nonpositive
     */
    public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)        //判断传入的table长度是否小于0,如果小于0抛出异常:错误的容量
            throw new IllegalArgumentException("Illegal initial capacity: " +
                    initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)     //判断是否大于HashMap的最大容量,如果大于最大容量,就把initialCapacity为MAXIMUM_CAPACITY(1 << 30)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +    //判断负载因子是否小于0,如果小于0抛出异常
                    loadFactor);
        this.loadFactor = loadFactor;               //负载因子赋值为传入的数值
        this.threshold = tableSizeFor(initialCapacity);     //赋值为最接近2的n次幂的table长度(看方法注释)
    }

  

  下面的构造函数为传入一个Map实现类。

  
/**
     * Constructs a new <tt>HashMap</tt> with the same mappings as the
     * specified <tt>Map</tt>.  The <tt>HashMap</tt> is created with
     * default load factor (0.75) and an initial capacity sufficient to
     * hold the mappings in the specified <tt>Map</tt>.
     *
     * @param   m the map whose mappings are to be placed in this map
     * @throws  NullPointerException if the specified map is null
     */
    public HashMap(Map<? extends K, ? extends V> m) {
        this.loadFactor = DEFAULT_LOAD_FACTOR;      //初始化扩容因子为默认的0.75
        putMapEntries(m, false);
    }

    /**
     * Implements Map.putAll and Map constructor
     *
     * @param m the map
     * @param evict false when initially constructing this map, else
     * true (relayed to method afterNodeInsertion).
     */
    final void putMapEntries(Map<? extends K, ? extends V> m, boolean evict) {
        int s = m.size();
        if (s > 0) {            //如果Map的size>0,执行以下程序
            if (table == null) { // pre-size        //如果这个时候table还未初始化
                float ft = ((float)s / loadFactor) + 1.0F;
                int t = ((ft < (float)MAXIMUM_CAPACITY) ?
                        (int)ft : MAXIMUM_CAPACITY);        //计算得出一个容量向上取整,如果大于最大容量,就赋值为最大容量,小于就截取
                if (t > threshold)
                    threshold = tableSizeFor(t);    //如果超过了threshold(此时为0),则取t最近的2的n次幂所得的长度赋值给threshold
            }
            else if (s > threshold)
                resize();
            for (Map.Entry<? extends K, ? extends V> e : m.entrySet()) {
                K key = e.getKey();
                V value = e.getValue();
                putVal(hash(key), key, value, false, evict);        //往table中添加node节点,此时evict为false,每次添加node都是新增节点(后面会讲到更改节点的情况)实际作用不大
            }
        }
    }

  构造方法先到这里,里面的一些问题下面会讲到。接下来我们来看put干了些啥事情。


public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);      //调用了hashcode方法计算key的hashcode值,在当作参数传入putVal中。
    }

//再看hash方法
static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);       //如果key为null,获取的hashcode为0,如果不是,扰动函数计算key的hashcode,减少hashcode碰撞的几率
    }

//再看putVal方法
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;     //tab:临时存放table;p:临时存放key所计算出的hashcode位运算后所得下标i元素的Node对象;n为table的长度;i为Node在table中下标
        if ((tab = table) == null || (n = tab.length) == 0)     //判断是不是table是否初始化,如果没有初始化,调用扩容方法,返回一个默认长度为16,扩容因子为0.75的table
            n = (tab = resize()).length;    //如果table未初始化,n赋值为tab的长度,也就是默认的16
        if ((p = tab[i = (n - 1) & hash]) == null)      //(n - 1) & hash位运算计算出key的hashcode值所放下标应该是多少,赋值给i
            tab[i] = newNode(hash, key, value, null);       //这个时候取tab【i】下标元素赋值一个新的Node;    下面else不走
        else {
            //////////////          这里是根据key的hashcode取下标元素,这时候元素已经存在table中的情况       //////////
            Node<K,V> e; K k;           //e:临时Node;  k:p(下标为i的Node)的key
            if (p.hash == hash &&
                    ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;                          //这种情况,table中下标为i的元素的key和传入key相同,hashcode计算相同。其余逻辑不走,然后在653行执行更改value操作(覆盖相同key的value,hashMap的特性)
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);     //这种情况下是下标为i的元素为红黑树结构了,往红黑树添加,并且平衡树,平衡完成后返回null;所以不更改原有下标元素
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {     //把e赋值为p的下一个Node
                        p.next = newNode(hash, key, value, null);           //(这个情况是p Node这个时候还未链表化)如果p的下一个Node为空,创建找寻到原有Node p 的next Node     由此得出重要结论(链表转树,TREEIFY_THRESHOLD是8,但是链表长度不一定是8,而且肯定比8要大,因为下一个链表节点已经创建,下面链表转树方法也将说明,链表转树,链表长度不一定为8)
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);          //尝试链表转树,然后跳出循环
                        break;
                    }
                    if (e.hash == hash &&
                            ((k = e.key) == key || (key != null && key.equals(k))))     //Node p存在next Node(这个情况是p Node这个时候已经链表化); Node e的hash和key都为传入的key的hashcode一致,并且e的key和传入的key一致,直接跳出循环
                        break;
                    p = e;      //(这个时候p已经链表化)上面两种情况都不满足的话,把 p Node指向下一个Node e,继续循环,然后就会走第二个if情况,跳出循环,链表上层和下次互换位置
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;                     //table被修改次数+1
        if (++size > threshold)         //如果达到了扩容量,进行扩容
            resize();
        afterNodeInsertion(evict);
        return null;
    }

下面附treeifyBin方法

final void treeifyBin(Node<K,V>[] tab, int hash) {      //链表转树结构
        int n, index; Node<K,V> e;
        if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)     //如果长度达不到MIN_TREEIFY_CAPACITY(64),不会进行转树,会先进性扩容
            resize();
        else if ((e = tab[index = (n - 1) & hash]) != null) {       //key的hashcode经过位运行得到得table下标i不为空得话,进行链表转为树结构
            TreeNode<K,V> hd = null, tl = null;
            do {
                TreeNode<K,V> p = replacementTreeNode(e, null);
                if (tl == null)
                    hd = p;
                else {
                    p.prev = tl;
                    tl.next = p;
                }
                tl = p;
            } while ((e = e.next) != null);
            if ((tab[index] = hd) != null)
                hd.treeify(tab);
        }
    }

 

问题1:为什么传入初始table长度,HashMap会直接转成最靠近2的n次幂数字呢?

  原因看上面方法putVal中的根据key的hashcode获取具体下标元素时:
  tab[i = (n - 1) & hash]  这个时候位运算的结果相当于hashcode%table.length ,如果不是2的n次幂,那么这个将不成立,位运算在hashMap的size相当大的时候,

  位运算得出下标要比取模得出下标效率要高出太多。所以,HashMap的table长度必须为2的n次幂,如果不是,会被强制转为2的n次幂。

问题2:扩容因子为什么是0.75?有必要去更改扩容因子吗?
  这个时候我们可以回过头看HashMap的注释:
 * <p>As a general rule, the default load factor (.75) offers a good
 * tradeoff between time and space costs.  Higher values decrease the
 * space overhead but increase the lookup cost (reflected in most of
 * the operations of the <tt>HashMap</tt> class, including
 * <tt>get</tt> and <tt>put</tt>).  The expected number of entries in
 * the map and its load factor should be taken into account when
 * setting its initial capacity, so as to minimize the number of
 * rehash operations.  If the initial capacity is greater than the
 * maximum number of entries divided by the load factor, no rehash
 * operations will ever occur.

  从上面可以看出,HashMap给出的扩容因子建议就是0.75,那么0.75是怎么来的呢?为什么是0.75,不是其他的呢?

  首先,我们从时间复杂度上面来考虑,我可以把扩容因子设置为0.5,这样hash碰撞的概率就小了很多,达到长度tabe就扩容找寻元素效率大大提升(缺点:空间利用率变低,每次扩容都还有一半空间未使用,空间换时间)。

  然后从空间复杂度上面来看,如果我设置为1的话,当table元素满了我才去扩容,这样空间利用率变高了。(缺点:hash碰撞概率提升,找寻元素效率降低,时间换空间)。

  从空间复杂度和时间复杂度上面看,取0.75是个比较折中的数字。但是,并不是这么回事,0.5和1之间,根据牛顿二项式推导出的结论为0.698...,HashMap开发人员取折中选为0.75。

 

  问题3:链表在什么时候会转为红黑树?

  答案在上面源码中贴出了,链表长度在达到8的时候会去尝试由链表转为树结构,但是是先创建了next Node,所以,链表此时已经不是8的长度了,而且,如果数组长度达不到MIN_TREEIFY_CAPACITY(64),尝试转树结构只会给数组扩容,并不会去实际转树结构,参照方法treeifyBin,和putVal中来看:所以链表在转树的情况下,有两个必要条件(链表长度>=8&&table.length>=64)

 

  问题4:为什么链表长度达到8的时候会去尝试转为红黑树呢?

  我们可以看看HashMap的注释:

  

   * Because TreeNodes are about twice the size of regular nodes, we
     * use them only when bins contain enough nodes to warrant use
     * (see TREEIFY_THRESHOLD). And when they become too small (due to
     * removal or resizing) they are converted back to plain bins.  In
     * usages with well-distributed user hashCodes, tree bins are
     * rarely used.  Ideally, under random hashCodes, the frequency of
     * nodes in bins follows a Poisson distribution
     * (http://en.wikipedia.org/wiki/Poisson_distribution) with a
     * parameter of about 0.5 on average for the default resizing
     * threshold of 0.75, although with a large variance because of
     * resizing granularity. Ignoring variance, the expected
     * occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
     * factorial(k)). The first values are:
     *
     * 0:    0.60653066
     * 1:    0.30326533
     * 2:    0.07581633
     * 3:    0.01263606
     * 4:    0.00157952
     * 5:    0.00015795
     * 6:    0.00001316
     * 7:    0.00000094
     * 8:    0.00000006
     * more: less than 1 in ten million

 

上图的0,1,2,3,4,5,6,7,8为链表长度概率(根据泊松分布概率计算得出),从上面可以看出,当链表长度为8的时候,已经是亿分之6的概率了,已经无限趋近于0,可以忽略不计;
(exp(-0.5) * pow(0.5, k) /factorial(k)  计算得出为8

  

HashMap的那些事

原文:https://www.cnblogs.com/ghsy/p/13352708.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!