HashMap分析

时间：2020-02-15 23:19:07 阅读：99 评论：0 收藏：0 [点我收藏+]

1.HashMap简介

HashMap基于哈希表的Map接口实现。是以key-value存储形式存在。线程不安全。key和value都可以为null，无序

JDK1.8之前由数组+链表组成，数组是HashMap主体，链表则主要是为了解决哈希冲突（两个对象调用的hashCode方法计算的哈希码值一致导致计算的数组索引值相同）而存在的（“拉链法”解决冲突），JDK1.8之后，当链表长度大于阈值（或者红黑树的边界值，默认为8）并且当前数组的长度大于64时，此时此索引位置上的所有数据改为使用红黑树存储。加入红黑树可以使查询效率更高。

补充：为了提高效率，将链表转换为红黑树前会判断，即使阈值大于8，但是数组长度小于64，此时并不会将链表变为红黑树，而是选择进行数组扩容。

技术分享图片

2.HashMap集合底层的数据结构

JDK1.8之前，数组+链表

JDK1.8之后，数组+链表+红黑树

技术分享图片

问题1：

1、哈希表底层采用何种算法计算hash值？还有哪些算法可以计算出hash值？
底层采用的key的hashCode方法的值结合数组长度进行无符号右移（>>>)、按位异或(^)计算hash值，按位与(&)计算出索引

static final int hash(Object key) {
      int h;
      return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
 }
//其中n为数组长度
(n - 1) & hash

还可以采用：平方取中法，取余数、伪随机数法

2.当两个对象的hashCode相等时会怎么样？

会产生哈希碰撞，通过调用equals方法比较key的内容是否相同，相同则替换旧的value，不然就连接到链表后面，链表长度超过阈值8转为红黑树。

3.在不断的添加数据的过程中，会涉及到扩容问题，当超出临界值时扩容，默认的扩容方式为扩充为原来的2倍，并将原有的数据复制过来。

4.1.8之后为什么引入红黑树，这样不是使结构更加复杂了吗？为什么阈值大于8转化成红黑树？

技术分享图片

说明：

size表示HashMap中K-V的实时数量，不是数组的长度
threshold（临界值）=capacity（容量）*loadFactor（加载因子）。这个值是当前已占用数组长度的最大值。size超过这个临界值就重新resize（扩容），扩容后的HashMap容量是之前容量的两倍

3.HashMap继承关系

public class HashMap<K,V> extends AbstractMap<K,V>
    implements Map<K,V>, Cloneable, Serializable {

    private static final long serialVersionUID = 362498820763181265L;

public abstract class AbstractMap<K,V> implements Map<K,V> {
    /**
     * Sole constructor.  (For invocation by subclass constructors, typically
     * implicit.)
     */
    protected AbstractMap() {
    }

技术分享图片

4.HashMap集合类的成员

4.1成员变量

1、序列化版本号

private static final long serialVersionUID = 362498820763181265L;

2、集合的初始化容量（必须是2的n次幂）

  /**
     * The default initial capacity - MUST be a power of two.
     */
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

问题：为什么大小必须是2的n次幂？

存储高效，尽量减少碰撞，在（length-1)&hash求索引的时候更均匀。

技术分享图片

问题：如果传入的容量默认不是2的幂，假如是10，会怎么样呢？

底层通过一些列的右移和或运算，把给定值变成比它大的最小的2的次数值，比如给10变成16，给17变成32。

//对传入容量进行右移位运算后进行或运算
//一共进行5次或运算，可以将当前数字中二进制最高位1的右边全部变成1，最后+1后返回
static final int tableSizeFor(int cap) {
        //这里-1的目的是使得找到的目标值大于或等于原值
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }

技术分享图片

完整例子：

技术分享图片

 public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        this.loadFactor = loadFactor;
        this.threshold = tableSizeFor(initialCapacity);
    }

3、默认的负载因子

static final float DEFAULT_LOAD_FACTOR = 0.75f;

4、集合最大容量

static final int MAXIMUM_CAPACITY = 1 << 30;

5、链表转红黑树的阈值

static final int TREEIFY_THRESHOLD = 8;

问题：为什么是8？

TreeNode占用空间是普通Node的两倍，空间和时间的权衡，同时如果为8，log（8）=3小于链表的平均8/2=4

  /* Because TreeNodes are about twice the size of regular nodes, we
     * use them only when bins contain enough nodes to warrant use
     * (see TREEIFY_THRESHOLD). And when they become too small (due to
     * removal or resizing) they are converted back to plain bins.  In
     * usages with well-distributed user hashCodes, tree bins are
     * rarely used.  Ideally, under random hashCodes, the frequency of
     * nodes in bins follows a Poisson distribution
     * (http://en.wikipedia.org/wiki/Poisson_distribution) with a
     * parameter of about 0.5 on average for the default resizing
     * threshold of 0.75, although with a large variance because of
     * resizing granularity. Ignoring variance, the expected
     * occurrences of list size k are (exp(-0.5) * pow(0.5, k)* /

技术分享图片

还有一种解释方式：

技术分享图片

6、红黑树转链表的阈值

static final int UNTREEIFY_THRESHOLD = 6;

7、链表转红黑树时数组的大小的阈值，即数组大小大于这个数字时，链表长度大于8才会转为红黑树

static final int MIN_TREEIFY_CAPACITY = 64;

8、table用来初始化数组（大小是2的n次幂）

transient Node<K,V>[] table;

技术分享图片

9、用来存放缓存（遍历的时候使用）

transient Set<Map.Entry<K,V>> entrySet;

10、HashMap中存放元素的个数（重点）

transient int size;

11、记录HashMap的修改次数

transient int modCount;

12、临界值（如果存放元素大小大于该值，则进行扩容）

int threshold;

13、哈希表的加载因子（重点）

final float loadFactor

技术分享图片

说明：

loadFactor加载因子，可以表示HashMap的舒米程度，影响hash操作到同一个数组位置的概率，默认0.75，不建议修改

4.2构造方法

HashMap分析

原文：https://www.cnblogs.com/xinmomoyan/p/12313875.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)