Java collection framework

Ahad Arif
25 min readNov 6, 2021

--

1.1.1. Overview of Java Collections

Java collections, also known as container, mainly derived from the two interfaces: one Collectoninterface is mainly used to store a single element; the other is the Mapinterface that is mainly used to store key-value pairs. For Collectionthe interface, below there are three main sub-interfaces: List, Setand Queue.

The Java collection framework is shown in the following figure:

Note: The figure only lists the main inheritance and derivation relationships, not all the relationships. For example AbstractList, the NavigableSetabstract classes and other auxiliary classes are omitted . If you want to learn more, you can check the source code by yourself.

1.1.2. What is the difference between List, Set, Queue and Map?

  • List(A good helper for order): The stored elements are ordered and repeatable.
  • Set(Pay attention to the unique nature): The stored elements are disordered and non-repeatable.
  • Queue(The number machine that realizes the queuing function): The sequence is determined according to the specific queuing rules, and the stored elements are orderly and repeatable.
  • Map(Experts who use key to search): Use key-value pair storage, similar to the mathematical function y=f(x), "x" represents key, "y" represents value, and key is unordered , Non-repeatable, value is unordered and repeatable, and each key is mapped to at most one value.

1.1.3. Summary of the underlying data structure of the collection framework

First look at the Collectioncollection interfaces below.

1.1.3.1. List

  • Arraylist: Object[]Array
  • Vector: Object[]Array
  • LinkedList: Doubly linked list (circular linked list before JDK1.6, JDK1.7 cancels the cycle)

1.1.3.2. Set

  • HashSet(Random, unique): based HashMapimplementation, the underlying employed HashMapto hold the elements of
  • LinkedHashSet: LinkedHashSetIt is the HashSetsubclass, and it is through the inner LinkedHashMapachieved. Somewhat similar to what we said before LinkedHashMapthe inside is based on the HashMaprealization of the same, but there are still a little bit of difference
  • TreeSet(Ordered, unique): Red-black tree (self-balanced sorting binary tree)

1.1.3.3 Queue

  • PriorityQueue: Object[]Array to implement binary heap
  • ArrayQueue: Object[]Array + double pointer

Let’s look at Mapa set of interfaces below.

1.1.3.4. Map

  • HashMap: Before JDK1.8 HashMapby the array + chain composed of an array is HashMapthe subject, the list is mainly to resolve hash collision exists ( "zipper method" to resolve the conflict). After JDK1.8, there has been a big change in the resolution of hash conflicts. When the length of the linked list is greater than the threshold (the default is 8) (it will be judged before converting the linked list into a red-black tree, if the length of the current array is less than 64, then it will choose When expanding the array first, instead of converting to a red-black tree), convert the linked list to a red-black tree to reduce the search time
  • LinkedHashMap: LinkedHashMapInherited from HashMap, so its bottom layer is still based on the zipper hash structure, which is composed of arrays and linked lists or red-black trees. In addition, LinkedHashMapon the basis of the above structure, a doubly linked list is added, so that the above structure can maintain the insertion order of key-value pairs. At the same time, through corresponding operations on the linked list, the access sequence-related logic is realized.
  • Hashtable: + List consisting of an array, the array is Hashtablesubject, the list is mainly to resolve hash collision exists
  • TreeMap: Red-Black Tree (Self-balanced Sorting Binary Tree)

1.1.4. How to choose a collection?

According to the main feature set to choose, such as when we need to get on the selection of element values based on a key Mapset of interfaces under, select need to sort TreeMap, you do not need to choose when ordering HashMap, the need to ensure the security thread on the selection ConcurrentHashMap.

When we only need to store the element value, you choose to implement Collectiona set of interface elements need to ensure that the only time choose to implement Seta set of interfaces such TreeSetor HashSetdo not need to choose to implement Listinterfaces such as ArrayListor LinkedList, and then to implement them according to a set of interface features Optional.

1.1.5. Why use collections?

When we need to store a set of data of the same type, we should use a container to store it. This container is an array. However, using arrays to store objects has certain drawbacks, because in actual development, the type of data stored They are diverse, so “collections” appear, and collections are also used to store multiple data.

The disadvantage of an array is that once it is declared, the length is immutable; at the same time, the data type when declaring the array also determines the type of data stored in the array; moreover, the data stored in the array is ordered, repeatable, and has a single characteristic . But collections improve the flexibility of data storage. Java collections can not only store objects of different types and numbers, but also store data with mapping relationships.

1.2. List of Collection sub-interface

1.2.1. What is the difference between Arraylist and Vector?

  • ArrayListIt is the Listmain implementation class, using the underlying Object[ ]storage for frequently find work, thread-safe;
  • VectorIs Listthe old implementation class, using the underlying Object[ ]storage, thread-safe.

1.2.2. What is the difference between Arraylist and LinkedList?

  1. Whether guarantee thread safety: ArrayList and LinkedListare not synchronized, that is not guaranteed to be thread safe;
  2. Underlying data structure: Arraylist the bottom layer using Objectan array ; LinkedListbottom using a doubly linked list data structure (as before JDK1.6 circular linked list, JDK1.7 canceled and the circular doubly linked list bidirectional attention distinction circular linked list, the below described.!)
  3. Whether the insertion and deletion are affected by the position of the element:
  • ArrayListArray storage is used, so the time complexity of inserting and deleting elements is affected by the position of the element. For example add(E e), when executing a method, ArrayListthe specified element will be appended to the end of the list by default. In this case, the time complexity is O(1). But if you want to insert and delete elements at the specified position i ( add(int index, E element)), the time complexity is O(ni). Because in the above operation, the i-th and (ni) elements after the i-th element in the set must be moved backward/forward by one bit.
  • LinkedListMemory linked list, so if the element is not affected by insertions or deletions in the head and tail positions of the elements ( add(E e), , addFirst(E e), , ),addLast(E e) approximately O (1), at the specified position if it is to insert and delete elements in it ( , ) time complexity The degree is approximately O(n), because you need to move to the specified position before inserting.removeFirst()removeLast()iadd(int index, E element)remove(Object o)
  1. Supports fast random access: LinkedList do not support efficient random element access, and ArrayListsupport. Fast random access is to quickly obtain the element object (corresponding to the get(int index)method) through the sequence number of the element .
  2. Memory space occupation: The space waste of ArrayList is mainly reflected in that a certain amount of space is reserved at the end of the list, while the space cost of LinkedList is reflected in that each element of it needs to consume more space than ArrayList (because it needs to store Direct successor and direct predecessor and data).

1.2.2.1. Supplementary content: doubly linked list and doubly circular linked list

Doubly linked list: contains two pointers, one prev points to the previous node, and one next points to the next node.

doubly linked list
doubly circular linked list

1.2.2.2. Supplementary content: RandomAccess interface

public  interface  RandomAccess { 
}

View source we found that actually RandomAccessinterface nothing is defined. So, in my opinion RandomAccessthe interface is just a logo Bale. What is the mark? Identifies that the class that implements this interface has random access capabilities.

In the binarySearch()process, it has to determine whether the incoming list RamdomAccessinstance, if so, call the indexedBinarySearch()method, if not, then call the iteratorBinarySearch()method

public  static  < T > 
int binarySearch ( List <? the extends ? the Comparable <Super T > > List, T Key) {
IF (List the instanceof the RandomAccess || List . size () < BINARYSEARCH_THRESHOLD )
return the Collections . indexedBinarySearch (List, Key);
else
return Collections . iteratorBinarySearch(list, key);
}

ArrayListImplements RandomAccessan interface, but LinkedListno implementation. why? I think it has something to do with the underlying data structure! ArrayListThe bottom layer is an array, and LinkedListthe bottom is a linked list. Arrays naturally support random access, and the time complexity is O(1), so it is called fast random access. The linked list needs to be traversed to a specific location to access the elements at a specific location. The time complexity is O(n), so fast random access is not supported. , ArrayListImplements RandomAccessthe interface, as shown by his fast random access. RandomAccessSimply identifies the interface, not to say that ArrayListrealize RandomAccessthe interface that has the fast random access function!

1.3. Set of Collection sub-interface

1.3.1. The difference between comparable and Comparator

  • comparableActually by the interface java.langpackage which has a compareTo(Object obj)method for ordering
  • comparatorThe interface is actually from the java.util package. It has a compare(Object obj1, Object obj2)method to sort

Generally , when we need to use a custom sort for a collection, we have to rewrite the compareTo()method or compare()method. When we need to implement two sorting methods for a certain collection, for example, the song name and singer name in a song object use one sort respectively For the method, we can rewrite the compareTo()method and use a self-made Comparatormethod or use two Comparators to achieve song name sorting and star name sorting. The second means that we can only use the two-parameter version Collections.sort().

1.3.1.1. Comparator custom sorting

ArrayList<Integer> arrayList = new ArrayList<Integer>();
arrayList.add(-1);
arrayList.add(3);
arrayList.add(3);
arrayList.add(-5);
arrayList.add(7);
arrayList.add(4);
arrayList.add(-9);
arrayList.add(-7);
System.out.println(" original array: ");
System.out.println(arrayList);
// void Reverse (List List): Reverse
Collections.reverse(arrayList);
System.out.println(" Collections.reverse(arrayList): ");
System.out.println(arrayList); // void sort(List list), sort in ascending order of natural sort

Collections.sort(arrayList);
System.out.println(" Collections.sort(arrayList): ");
System.out.println(arrayList);
// Usage of custom sorting
Collections.sort(arrayList, new Comparator<Integer>() {
@Override
public int compare(Integer o1, Integer o2) {
return o2.compareTo(o1);
}
});
System.out.println("After custom sorting: ");
System.out.println(arrayList);

Output:

the original array:
[-1, 3, 3, -5, 7, 4, -9, -7]
Collections.reverse(arrayList):
[-7, -9, 4, 7, -5, 3, 3, -1]
Collections.sort(arrayList):
[-9, -7, -5, -1, 3, 3, 4, 7]
After custom sorting:
[7, 4, 3, 3, -1, -5, -7, -9]

1.3.1.2. Override the compareTo method to sort by age

/** The person object does not implement the Comparable interface, 
* so it must be implemented so that there is no error, so that the
* data in the treemap can be arranged in order.
* The String class in the previous example has implemented the
* Comparable interface by default. You can check the String class
* for details API documentation, and other
* classes like Integer have already implemented the Comparable
* interface, so there is no need to implement additional **/
public class Person implements Comparable< Person > {
private String name;
private int age;
public Person ( String name , int age ) {
super ();
this . name = name;
this . age = age;
}
public String getName () {
return name;
}
public void setName ( String name ) {
this . name = name;
}
public int getAge () {
return age;
}
public void setAge ( int age ) {
this . age = age;
}
/**
* Task: rewrite the compareTo method to achieve sorting by age
**/
@Override
public int compareTo ( Person o ) {
if ( this . Age > o . GetAge()) {
return 1 ;
} if ( this . Age < o . GetAge()) {
return - 1 ;
} return 0 ;
}
}
public static void main( String [] args) {
TreeMap< Person , String > pdata
= new TreeMap< Person , String > ();
pdata.put( new Person ( "rohim mia" , 30 ), " zhangsan " );
pdata.put( new Person ( "John Doe " , 20 ), " lisi" ;)
PData.put( new Person ( "Wang Wu" , 10 ), " wangwu " ;)
pData.put( new Person ( "red" , 5 ), " Xiaohong " );
// get the value of the key At the same time get the value
// corresponding to the key
Set< Person > keys = pdata . KeySet();
for ( Person key :keys) {
System.out.println(
key.getAge() + "-" + key.getName()
);
}
}

Output:

5-Xiaohong
10-wangwu
20-lisi
30-zhangsan

1.3.2. What is the meaning of disorder and non-repeatability

1. What is disorder? Disorder is not equal to randomness. Disorder means that the stored data is not added in the order of the array index in the underlying array, but is determined according to the hash value of the data.

2. What is non-repeatability? Non-repeatability means that when the added element is judged according to equals(), it returns false, and the equals() method and hashCode() method need to be rewritten at the same time.

1.3.3. Compare the similarities and differences between HashSet, LinkedHashSet and TreeSet

HashSetIs the Setprimary interface implementation class, HashSetthe bottom is HashMap, thread-safe, a null value may be stored;

LinkedHashSetIs a HashSetsubclass can be traversed in order of addition;

TreeSet The bottom layer uses red-black trees, the elements are ordered, and the sorting methods include natural sorting and custom sorting.

1.4 Queue of Collection sub-interface

1.4.1 The difference between Queue and Deque

QueueSingle-ended queue, the element can only be inserted from one end, the other end delete elements to achieve a generally follow the first in first out (FIFO) rules.

QueueExtended Collectioninterface, according as the capacity problems caused by the different processing mode after a failed operation can be divided into two categories: one when the operation fails will throw an exception, another special value is returned.

Queue interfaceThrow an exceptionReturn special valueInsert the end of the lineadd(E e)offer(E e)Delete leaderremove()poll()Query the leader elementelement()peek()

Deque It is a double-ended queue, and elements can be inserted or deleted at both ends of the queue.

DequeIt extends the Queueinterface, adding methods to insert and remove the head of the queue and the tail, also divided into two categories, depending on the mode of treatment failure:

Deque interfaceThrow an exceptionReturn special valueInsert the head of the lineaddFirst(E e)offerFirst(E e)Insert the end of the lineaddLast(E e)offerLast(E e)Delete leaderremoveFirst()pollFirst()Delete tailremoveLast()pollLast()Query the leader elementgetFirst()peekFirst()Query tail elementsgetLast()peekLast()

In fact, Dequealso provided push()and pop()other methods can be used to simulate a stack.

1.4.2 The difference between ArrayDeque and LinkedList

ArrayDequeAnd LinkedListwe have achieved Dequethe interface, both of which have the function of the queue, but both have what difference does it make?

  • ArrayDequeIs variable length based on the double pointer array and to achieve, and LinkedListis implemented by a linked list.
  • ArrayDequeIt does not support storage NULLof data, but LinkedListsupport.
  • ArrayDequeIt was introduced in JDK1.6, and LinkedListit already existed as early as JDK1.2.
  • ArrayDequeThere may be an expansion process during insertion, but the insertion operation after amortization is still O(1). Though LinkedListnot require expansion, but each time need to apply for a new heap space when you insert data, the performance is slower compared to an equal share.

From a performance standpoint, the choice ArrayDequeto queue than LinkedListbetter. In addition, ArrayDequeit can also be used to implement a stack.

1.4.3 Talk about PriorityQueue

PriorityQueueWas introduced in JDK1.5, which is the Queuedifference is that the elements of a team order is related to the priority that is always the highest priority elements of the first team.

Here are some relevant points:

  • PriorityQueue It uses the binary heap data structure to achieve, and the bottom layer uses variable-length arrays to store data
  • PriorityQueue Through the floating and sinking of heap elements, it is possible to insert elements and delete elements at the top of the heap within O(logn) time complexity.
  • PriorityQueueNon-thread-safe, and does not support memory NULLand non-comparableobjects.
  • PriorityQueueThe default is a small pile top, but may receive one Comparatoras a structural parameters to customize the elements has priority.

PriorityQueue In the interview, it may appear more in the hand-tear algorithm. Typical examples include heap sorting, finding the K-th largest number, traversing weighted graphs, etc., so you need to be proficient in using it.

1.5. Map interface

1.5.1. The difference between HashMap and Hashtable

  1. Thread Is it safe: HashMap non-thread-safe, Hashtablethread-safe, because the Hashtablemethods of the basic internal have been synchronizedmodified. (If you want to ensure thread-safe, then use ConcurrentHashMapit!);
  2. Efficiency: Because the thread safety problems HashMapthan Hashtablea little high efficiency. In addition, it is Hashtablebasically eliminated, do not use it in the code;
  3. Support for Null key and Null value: Null key and value HashMap can be stored, but there can only be one null as the key, and there can be multiple null as the value; Hashtable does not allow null keys and null values, otherwise it will be thrown NullPointerException.
  4. The difference between the initial capacity and the capacity for each expansion: ① If you do not specify the initial value of the capacity when creating it, the Hashtabledefault initial size is 11, and after each expansion, the capacity becomes the original 2n+1. HashMapThe default initialization size is 16. After each expansion, the capacity is doubled. When ② create if given the capacity of the initial value, Hashtable will direct you to use a given size, and HashMapwill expand its size is a power of 2 ( HashMapin a tableSizeFor()way to ensure, source code is given below). That HashMapalways use a power of 2 as the size of the hash table, it will be introduced later to why it is a power of two.
  5. Underlying data structure: JDK1.8 later HashMaphas made significant changes in resolving hash collision, when the chain length is greater than the threshold value (default 8) (converted to a list before the red-black tree determines, if the current array is less than the length of 64, then it will choose to expand the array first, instead of converting to a red-black tree), convert the linked list to a red-black tree to reduce the search time. Hashtable has no such mechanism.

HashMap Constructor with initial capacity in:

public HashMap(int initialCapacity, float loadFactor) {
if (initialCapacity < 0)
throw new IllegalArgumentException
("Illegal initial capacity:" + initialCapacity);

if (initialCapacity > MAXIMUM_CAPACITY)
initialCapacity = MAXIMUM_CAPACITY;

if (loadFactor <= 0 || Float.isNaN(loadFactor))
throw new IllegalArgumentException(" Illegal load factor:" +
loadFactor);
this.loadFactor = loadFactor;
this.threshold = tableSizeFor(initialCapacity);
}

public HashMap(int initialCapacity) {
this(initialCapacity, DEFAULT_LOAD_FACTOR);
}

The following method ensures HashMapalways use a power of 2 as the size of the hash table.

/**
* Returns a power of two size for the given target capacity.
*/
static final int tableSizeFor(int cap) {
int n = cap - 1;
n |= n >>> 1;
n |= n >>> 2;
n |= n >>> 4;
n |= n >>> 8;
n |= n >>> 16;

return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

1.5.2. The difference between HashMap and HashSet

If you look at HashSetthe source code, then you should know that: HashSetunderlayer is based on HashMapimplementation. ( HashSetSource very, very small, because in addition clone(), writeObject(), readObject()is HashSetoutside of themselves having to implement, other methods are called directly HashMapmethod.

HashMapHashSetImplements Mapthe interfaceImplement SetInterfaceStore key-value pairsStore objects onlyCall to put()add elements to the mapCall the add()method to Setadd elementsHashMap Use key (Key) calculation hashcodeHashSetUsing the member object to calculate hashcodea value for the two objects, it hashcodemay be the same, the equals()method for determining the object equality

1.5.3. The difference between HashMap and TreeMap

TreeMapHashMapBoth and are inherited from AbstractMap, but it should be noted that TreeMapit also implements NavigableMapinterfaces and SortedMapinterfaces.

Implement NavigableMapthe interface so that TreeMaphas to search within a collection of elements of capacity.

Implement SortedMapthe interface so that TreeMapwith the collection of key elements based on the ability to sort. The default is to sort by key in ascending order, but we can also specify the sorting comparator. The sample code is as follows:

public class Person {
private Integer age;

public Person(Integer age) {
this.age = age;
}

public Integer getAge() {
return age;
}

public static void main(String[] args) {
TreeMap<Person, String> treeMap
= new TreeMap<>(new Comparator<Person>() {
@Override
public int compare(Person person1, Person person2) {
int num = person1.getAge() - person2.getAge();
return Integer.compare(num, 0);
}
});
treeMap.put(new Person(3), "person1");
treeMap.put(new Person(18), "person2");
treeMap.put(new Person(35), "person3");
treeMap.put(new Person(16), "person4");
treeMap.EntrySet()
.Stream()
.forEach(
personStringEntry ->
{ System.out.println(personStringEntry.getValue());
});
}
}

Output:

person1
person4
person2
person3

As can be seen, TreeMapthe elements are already in accordance with the Personascending order to the age field are arranged.

Above, we implemented it by passing in an anonymous inner class. You can replace the code with a Lambda expression implementation:

TreeMap<Person, String> treeMap = new TreeMap<>(
(person1, person2) -> {
int num = person1.getAge() - person2.getAge();
return Integer.compare(num, 0);
});

In summary, compared to HashMapit TreeMapmainly based on the ability to multi-key sort, and the ability to search for elements in the collection within a collection of elements.

1.5.4. How does HashSet check for duplicates

The following is an excerpt from the second edition of my Java enlightenment book “Head first java”:

When you join the object HashSet, the HashSetfirst calculates a target hashcodevalue to determine the position of the object added, will also join with other objects hashcodeas the comparison value, if there is no match hashcode, HashSetwill assume the object is not repeated. But if you find the same hashcodeobject value, then calls the equals()method to check hashcodewhether an object is equal to the same really. If the two are the same, the HashSetjoin operation will not succeed.

In openjdk8 in HashSetthe add()method simply calls HashMapthe put()method, and the judgment of the return value to make sure that if there are duplicate elements. Look directly at HashSetthe source code in:

// Returns: the this set to true IF The specified DID Not Contain already Element
// Returns true if this set contains no elements add time: Return Value
public Boolean add(E E) {
return Map.put(E, PRESENT) == null;
}

The following instructions can also be seen in HashMapthe putVal()method:

// Returns: previous value, or null if none 
// Return value: if there is no element at the insertion position, return null, otherwise return to the previous element
final V putVal(int hash,
K key,
V value,
boolean onlyIfAbsent,
boolean evict) {
...
}

That is to say, in openjdk8, in fact, no matter HashSetwhether an element already exists in it, it HashSetwill be inserted directly, but add()the return value of the method will tell us whether the same element exists before the insertion.

hashCode()And equals()the relevant provisions:

  1. If two objects are equal, hashcodeit must also be the same
  2. Two objects are equal, two equals()method returns true
  3. Two objects have the same hashcodevalue, they are not necessarily equal
  4. In summary, equals()a method had to be covered, the hashCode()method must be covered
  5. hashCode()The default behavior is to generate unique values ​​for objects on the heap. If there is no rewrite hashCode(), the two objects of the class will not be equal anyway (even if the two objects point to the same data).

The difference between == and equals

For basic types, == compares whether the values ​​are equal;

For reference types, == compares whether two references point to the same object address (whether the addresses stored in the memory (heap memory address) of the two points to the same place);

For reference types (including packaging types), if equals is not overridden, compare their addresses to see if they are equal; if the equals() method is overridden (such as String), compare the contents of the address.

1.5.5. The underlying implementation of HashMap

1.5.5.1. Before JDK1.8

JDK1.8 before HashMapthe bottom is an array and linked lists used together is chain hash . HashMap obtains the hash value after the hashCode of the key is processed by the perturbation function, and then judges the location where the current element is stored through (n-1) & hash (here n refers to the length of the array), if there is an element at the current position, judge Whether the hash value and key of the element and the element to be stored are the same, if they are the same, they are directly covered, if they are not the same, the conflict will be resolved through the zipper method.

The so-called disturbance function refers to the hash method of HashMap. The use of the hash method, that is, the disturbance function, is to prevent some poorly implemented hashCode() methods. In other words, the use of the disturbance function can reduce collisions.

Source code of hash method of JDK 1.8 HashMap:

The hash method of JDK 1.8 is more simplified than the JDK 1.7 hash method, but the principle remains the same.

static final int hash(Object key) {
int h;
// key.hashCode(): returns the hash value, which is hashcode
// ^: bitwise XOR
// >>>: unsigned right shift, ignoring the sign bit, All spaces are filled with 0
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

Compare the source code of the hash method of HashMap of JDK1.7.

static int hash(int h) {
// This function ensures that hashCodes that differ only by
// constant multiples at each bit position have a bounded
// number of collisions (approximately 8 at default load factor).
h ^= (h >>> 20 ) ^(h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
}

Compared with the hash method of JDK1.8, the performance of the hash method of JDK 1.7 will be a little bit worse, because it is disturbed 4 times after all.

The so-called “zipper method” is to combine a linked list with an array. In other words, create a linked list array, and each cell in the array is a linked list. If you encounter a hash conflict, add the conflicted value to the linked list.

1.5.5.2. After JDK1.8

Compared with the previous version, after JDK1.8, there have been major changes in the resolution of hash conflicts. When the length of the linked list is greater than the threshold (default is 8) (the linked list will be judged before the red-black tree is converted, if the current array is If the length is less than 64, it will choose to expand the array first instead of converting to a red-black tree), convert the linked list to a red-black tree to reduce the search time.

Red and black trees are used in the bottom layer of TreeMap, TreeSet, and HashMap after JDK1.8. The red-black tree is to solve the shortcomings of the binary search tree, because the binary search tree will degenerate into a linear structure in some cases.

1.5.6. Why the length of HashMap is a power of 2

In order to make HashMap access efficient, there are as few collisions as possible, that is, to distribute the data as evenly as possible. As we mentioned above, the range of hash values ​​is -2147483648 to 2147483647, which adds up to about 4 billion mapping space. As long as the hash function is mapped more uniformly and loosely, it is difficult for general applications to collide. But the problem is that an array of 4 billion length cannot fit in the memory. So this hash value cannot be used directly. Before using, it is necessary to perform a modulo operation on the length of the array, and the remainder obtained can be used for the location to be stored, which is the corresponding array subscript. The calculation method of the subscript of this array is “ (n - 1) & hash". (N represents the length of the array). This also explains why the length of HashMap is a power of 2.

How should this algorithm be designed?

First of all, we may think of adopting the operation of% to take the remainder to achieve it. However, the important point is here: “In the remainder (%) operation, if the divisor is a power of 2, it is equivalent to the AND (&) operation that subtracts one from the divisor (that is, hash%length==hash&(length-1) provided that the length is the n-th power of 2;). “ and binary bit operation &,% with respect to the operation efficiency can be improved, which explains why HashMap length is a power of two.

1.5.7. HashMap multi-threaded operation causes infinite loop problem

The main reason is that Rehash under concurrency will cause a circular linked list between elements. However, this problem has been solved after jdk 1.8, but it is still not recommended to use HashMap in multi-threaded mode, because using HashMap in multi-threaded mode will still have other problems such as data loss. Concurrent HashMap is recommended in a concurrent environment.

1.5.8. What are the common traversal methods of HashMap?

1. Iterator EntrySet

public class HashMapTest {
public static void main(String[] args) {
Map<Integer, String> map = new HashMap();
map.put(1, "Java");
map.put(2, "JDK");
map.put(3, "Spring Framework");
map.put(4, "MyBatis framework");
map.put(5, "Java");

Iterator<Map.Entry<Integer, String>> iterator
= map.entrySet().iterator();
while (iterator.hasNext()) {
Map.Entry<Integer, String> entry = iterator.next();
System.out.println(entry.getKey());
System.out.println(entry.getValue());
}
}
}

The execution result of the above program is:

1 
Java
2
JDK
3
Spring Framework
4
MyBatis framework
5
Java Chinese Community

2. Iterator KeySet

Iterator<Integer> iterator = map.keySet().iterator();
while (iterator.hasNext()) {
Integer key = iterator.next();
System.out.println(key);
System.out.println(map.get(key));
}

3.ForEach EntrySet

for (Map.Entry<Integer, String> entry : map.entrySet()) {
System.out.println(entry.getKey());
System.out.println(entry.getValue());
}

4.ForEach KeySet

for (Integer key : map.keySet()) {
System.out.println(key);
System.out.println(map.get(key));
}

5.Lambda

map.forEach((key, value) -> {
System.out.println(key);
System.out.println(value);
});

6.Streams API single thread

map.entrySet().stream().forEach((entry) -> {
System.out.println(entry.getKey());
System.out.println(entry.getValue());
});

7.Streams API multi-threaded

map.entrySet().parallelStream().forEach((entry) -> {
System.out.println(entry.getKey());
System.out.println(entry.getValue());
});

Performance Testing :

Among them, Units is ns/op which means the execution completion time (in nanoseconds), and Score is listed as the average execution time, and the ± symbol indicates the error. As can be seen from the above results, two entrySetsimilar properties, and performs the fastest, followed by stream, then two keySet, the worst performance is KeySet.

in conclusion

The results can be seen from the above entrySetperformance than the keySetperformance higher than doubled, so we should try to use entrySet to achieve traverse Map collection .

1.5.9. The difference between ConcurrentHashMap and Hashtable

ConcurrentHashMapAnd Hashtablethe difference is mainly reflected in the different ways to achieve thread-safe.

  • Underlying data structure: JDK1.7 the ConcurrentHashMapunderlying employed segment array list + implemented, data structures JDK1.8 employed with HashMap1.8the same structure, a linked list arrays + / red and black binary tree. HashtableAnd before JDK1.8 HashMapunderlying data structures are based on a similar array of + chain form, the array is subject HashMap, the main chain is to solve hash collision exists;
  • The way to achieve thread safety (important): In JDK1.7, ConcurrentHashMap(segment lock) the entire bucket array was divided and segmented ( Segment), each lock only locks part of the data in the container, and multi-threaded access to the container There will be no lock contention for data in different data segments, which improves the concurrent access rate. To JDK1.8 time we have abandoned the Segmentconcept, but directly with the Nodedata structure array + + red-black tree list implemented using concurrency control synchronizedand operate CAS. (JDK1.6 later to synchronizedlock done a lot of optimization) the whole looks like optimized and thread-safe HashMap, although in JDK1.8 can see Segmentthe data structure, but has simplified the property, only to compatible with older versions ; ② Hashtable(same lock) : use synchronizedto ensure thread safety, efficiency is very low. When a thread accesses the synchronization method, other threads also access the synchronization method, and may enter a blocking or polling state. For example, use put to add elements, and another thread cannot use put to add elements or use get. The competition will become more and more fierce. The lower the efficiency.

Comparison chart of the two:

Hashtable:

Concurrent HashMap of JDK1.7:

Concurrent HashMap of JDK1.8:

JDK1.8 is ConcurrentHashMapno longer Segment Array array + + HashEntry list , but Node list array + / red-black tree . However, Node can only be used in the case of linked lists, and needs to be used in the case of red-black trees TreeNode. When the conflict chain is expressed to a certain length, the linked list will be converted into a red-black tree.

1.5.10. Concurrent HashMap thread safety specific implementation method/underlying specific implementation

1.5.10.1. JDK1.7 (there is a schematic diagram above)

First divide the data into a segment of storage, and then assign a lock to each segment of data. When a thread occupies the lock to access one of the segments of data, the data of other segments can also be accessed by other threads.

Concurrent HashMapIt is a Segmentstructure of the array and the HashEntryarray of structures .

Segment achieved ReentrantLock, it Segmentis a reentrant lock, play the role of lock. HashEntryUsed to store key-value pair data.

static  class  Segment <K,V> 
extends ReentrantLock
implements Serializable {
}

A ConcurrentHashMapcontains a Segmentarray. SegmentStructure and HashMapthe like, and an array of a list structure, one Segmentcontaining an HashEntryarray, each HashEntryis an element of a list structure, each Segmentguarding a HashEntryarray element, when on the HashEntrydata array is modified, you must first obtain the corresponding the Segmentlocks.

1.5.10.2. JDK1.8 (there is a schematic diagram above)

ConcurrentHashMapCancel the Segmentsegment lock, the use of CAS and synchronizedto ensure concurrency safety. The data structure is similar to that of HashMap1.8, array + linked list/red-black binary tree. Java 8 converts the linked list (addressing time complexity is O(N)) into a red-black tree (addressing time complexity is O(log(N))) when the length of the linked list exceeds a certain threshold (8)

synchronized Only lock the first node of the current linked list or red-black binary tree, so as long as the hash does not conflict, no concurrency will occur, and the efficiency will be increased by N times.

1.6. Collections tools

Common methods of Collections tool class:

  1. Sort
  2. Find, replace operation
  3. Synchronization control (not recommended, please consider using the concurrent collection under the JUC package when thread-safe collection types are required)

1.6.1. Sorting operations

void reverse( List list) // Reverse 
void shuffle( List list) // Random sort
void sort( List list) // Sort in ascending order of natural sort
void sort( List list, Comparator c) // Custom sort, by Comparator Control the sorting logic
void swap( List list, int i, int j) // Swap the elements at two index positions
void rotate( List list, int distance) //Spin. When distance is a positive number, move the distance elements after the list to the front as a whole. When distance is negative, move the first distance elements of the list to the back as a whole

1.6.2. Find and replace operations

int binarySearch( List list, Object key) // Perform binary search on List and return the index. Note that List must be ordered 
int max( Collection coll) // Return the largest element according to the natural order of the elements. Analogy int min(Collection coll)
int max( Collection coll, Comparator c) // According to custom sorting, the largest element is returned, and the sorting rules are controlled by the Comparatator class. Analogy int min(Collection coll, Comparator c)
void fill( List list, Object obj) // Replace all elements in the specified list with specified elements
int frequency( Collection c, Object o) // Count the number of occurrences of elements
int indexOfSubList( List list, List target) // Count the index of the first occurrence of target in the list, return -1 if not found, analogy int lastIndexOfSubList(List source, list target)
boolean replaceAll( List list, Object oldVal, Object newVal) // Replace the old element with the new element

1.6.3. Synchronous control

CollectionsProvides multiple synchronizedXxx()methods. This method can pack the specified collection into a thread-synchronized collection, thereby solving the thread safety problem when multiple threads access the collection concurrently.

We know that HashSet, TreeSet, ArrayList, LinkedList, HashMap, TreeMapit is thread safe. CollectionsProvides a number of static methods that can be packaged into a thread-synchronized collection.

It is best not to use the following methods. The efficiency is very low. If you need a thread-safe collection type, please consider using the concurrent collection under the JUC package.

Methods as below:

// Returns the synchronized (thread-safe) collection supported by 
// the specified collection.
synchronizedCollection( Collection< T > c)
// Returns a synchronized (thread-safe) List supported by the
// specified list.
synchronizedList( List< T > list)
// Returns a synchronized (thread-safe) Map supported by the
// specified map.
synchronizedMap( Map< K , V > m)
// Returns the synchronized (thread-safe) set supported by the
// specified set.
synchronizedSet( Set< T > s)

--

--