Internal working of Hash Map in java

If it's smart, it's vulnerable.

Introduction

HashMap is one of the most used data structures in a production environment; it is widely used in programming and can be found as a critical element for many algorithms. Its primary advantage is that the insert, Search, and Delete operation is performed in O(1) !!!. Before moving forward to understand HashMap, let’s understand the basics first.

What is hashing?

The process of converting an object or primitive data type into some "representative/practical" value is known as hashing. Hashing is used for faster searching of elements and efficient indexing. The algorithm or method used for this process is known as a hash function. Mathematically speaking, a hash function is a one-way mathematical function capable of mapping an input set of values to another (usually, a much smaller set) of values. E.g.:

h("DarkRelay") = 1, h("CyberSec") = 10

Examples of a hash function

Let's consider the following two examples to understand what a hash function looks like

For large integer Keys: using modulo function i.e. h(key) = (key % m)

Here, "h" denotes the hash function, "m" is any integer, resulting in a value in the range of [0, m-1] (both included) for any given key and "%" denotes modulo operation, which returns the remainder from the division of "key" with "m."

To understand this better, let's try to solve: h(12,29,52,19,60) where m = 5. For a key value of "12", the hash value becomes h(12) = 12%5 i.e., 2. Therefore, h(12) = 2. Similarly,

h(29) = 29%5 = 4
h(52) = 52%5 = 2
h(19) = 19%5 = 4
h(60) = 60%5 = 0

For string keys: using weighted sum

We can use a technique such as the "weighted Sum method" to calculate the hash of string values. i.e. (s[0]*x⁰ + s[1]*x¹ + s[1]*x² +....... and so on) % m

Here, "s [0]" represent the ASCII value of the character present at the 0th index, "x" is any prime number, and "m" is an integer. Let's try to understand this with an example: h("darkrelay") where x = 3 and m =5.

Now, checking up the ASCII value of characters in question, we get:

"d" = 100, "a" = 97,  "r" = 114, "k" = 107, "r" = 114, "e" = 101, "l" = 108 and "y" = 121
assuming x = 3 and . m = 3  the equation becomes:
(100*3⁰ + 97*3¹ + 114*3² + 107*3³ + 114*3⁴ + 101*3⁵ + 108*3⁶ + 97*3⁷ + 121*3⁸) % 3 =
	(100*1 + 97*3 + 114*9 + 107*27 + 114*81 + 101*243 + 108*729 + 97*2187 + 121*6561) % 3 =
	(1122835) % 3 = 1
	Therefore, h("darkrelay") = 1

What is a "map"?

According to Oracle's Java documentation:

A Map is an object that maps keys to values. A map cannot contain duplicate keys: Each key can map to at most one value. It models the mathematical function abstraction. The Map interface includes methods for basic operations (such as put, get, remove, containsKey, containsValue, size, and empty), bulk operations (such as putAll and clear), and collection views (such as keySet, entrySet, and values).

Defining a HashMap

A HashMap is a Hash table-based implementation of the Map interface. A HashMap is denoted by <Key, Value>. It extends AbstractMap<K, V> class and implements Map, Cloneable, and Serializable Interface. The basic syntax is as follows

public class HashMap<K,V> extends AbstractMap<K,V> implements Map<K,V>, Cloneable, Serializable

Some important things to remember about hash maps regarding java before we make a deeper dive are:·

Insert, Search, Delete operation is performed in O(1), making them an ideal choice for cases where the most frequent operation is a search operation
Insertion order is not preserved, rather it is based on the hash code of the keys
Duplicate keys are not allowed, but values can be duplicates
Heterogeneous objects are allowed for both keys and value
null is allowed for key(only once), null is allowed for values(any no. of time)
HashMap implements Serializable and Cloneable interfaces but not RandomAccess

HashMaps in java

With reference to java, the hash map is a class that implements the Map interface allowing the user to store, read and process "key and value pairs", given that the keys are unique. A few important things to keep in mind when working with hashmaps are:

The underlying data structure is Hashtable.
By default, it initializes with a capacity of 16.
HashMap in java does not preserve insertion/updating order.
Heterogeneous objects are allowed for both keys and value
It restricts that EACH KEY in a given map must be unique.
HashMap implements Serialisable and Clone-able interfaces but not RandomAccess.
HashMap in java only allows one instance of a null key, whereas there can be multiple null values.

Internally, HashMap leverages HashTable (HashTable is an Array of Node where each node contains Key, value, a hash of the key, and an extra Node to point another node if a Hash collision happens) to store the data as an array of "Entry". So, what is this "Entry" we are talking about here? Essentially, Entry is a static class that implements Map.Entry Interface whose implementation is given below:

static class Entry implements Map.Entry<K,V> {
    final K Key;
    V value;
    Entry<K,V> next;
    final int hash;
    ......// some more code.

Here, as you might have observed, "Key" and "hash" are declared as final as they are not supposed to change. The "Value" can change any number of times, and we have a "next" Entry that can refer to the next node.

Initializing a new HashMap

"HashMap<K, V> m = new HashMap<K, V> ();"

Creates an empty HashMap with an initial default capacity of 16 and a default fill ratio of 0.75.

"HashMap<K, V> m = new HashMap <K, V> (int initialCapacity);"

Creates an empty HashMap with specified initial capacity and default fill ratio of 0.75

"HashMap<K, V> m = new HashMap<K, V>(int initialCapacity, float fillRatio);"

Create an empty HashMap with Initial Capacity and Fill Ratio as fillRatio.

"HashMap m = new HashMap <K, V>(Map map);"

Create an instance of HashMap with the exact mapping specified in the map.

Internal working of HashMap

At this point, it should be pretty clear what a HashMap does; The simplest put it is that it stores the data in key-value pair. We can perform multiple operations on them, such as "putting" data into a HashMap, "getting" data from it, deleting it in the HashMap, and so on. Suppose we can understand the "how" for putting and getting the data from a HashMap. In that case, we can extrapolate the same basic principles to understand the remaining functionalities, as they follow the same basic working principle. So, let’s divide our task into two parts; first, we will look into put() method and then the get() method, and our job is done!

Internal working of the put() method

To understand the internal workings, let’s quickly glance over the put method's implementation in HashMap.

public V put(K key, V value) {
	return putVal(hash(key), key, value, false, true);
}
	
	
final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) {
	Node<K,V>[] tab; Node<K,V> p; int n, i;
	if ((tab = table) == null || (n = tab.length) == 0)
		n = (tab = resize()).length;
	if ((p = tab[i = (n - 1) & hash]) == null)
		tab[i] = newNode(hash, key, value, null);
	else {
		Node<K,V> e; K k;
		if (p.hash == hash &&
			((k = p.key) == key || (key != null && key.equals(k))))
			e = p;
		else if (p instanceof TreeNode)
			e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
		else {
			for (int binCount = 0; ; ++binCount) {
				if ((e = p.next) == null) {
					p.next = newNode(hash, key, value, null);
					if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
						treeifyBin(tab, hash);
					break;
				}
				if (e.hash == hash &&
					((k = e.key) == key || (key != null && key.equals(k))))
					break;
				p = e;
			}
		}
		if (e != null) { // existing mapping for key
			V oldValue = e.value;
			if (!onlyIfAbsent || oldValue == null)
				e.value = value;
			afterNodeAccess(e);
			return oldValue;
		}
	}
	++modCount;
	if (++size > threshold)
		resize();
	afterNodeInsertion(evict);
	return null;

Let's breakdown the above code into digestible sizes:

We will check the key object for null. We know that hashCode of null is 0. So, if the key is null, it will be stored at index 0 of the Hashtable.
We will call the hashCode () method if it is not null. This hash value will be used to find the position of the table for storing Entry.
indexFor (hash, table.length) method will be called to find the exact index for that specific Entry.

Before moving on, we need to address a significant issue, i.e. hash collisions, and how to handle them. A hash collision occurs when two objects have the same hashCode value, so they have to be stored in the same array position, also known as a "bucket." One might wonder, how will we store multiple objects at the exact array location called bucket ???

The answer can be guessed easily by looking at the Entry class, which contains the next Entry indicating the LinkedList data structure. Hence In case of a hash collision, we will store the Entry object in LinkedList. For storing this Entry, HashMap checks the next of an existing Entry; if it is null then-current Entry will become the next node in LinkedList. If the next of the Entry is not null, it will keep on checking until it finds null, and once it finds it, it will set its next as the current Entry.

Another scenario can be, adding another Entry with the same Key as one of the existing Entry. In this case, it will update the Entry value, but how can this be achieved? Let’s understand while iterating over already existing Entries in LinkedList, HashMap will keep calling the equals method for Key on every Entry object. All these Entries in LinkedList will have similar HashCode, but the equal () method will test the object's true equality. If it finds the same key, it will update its value, and this way, it can maintain its "uniqueness of key."

Internal working of the get() method

Let's have a look at the implementation of the get method for HashMap:

/*
* Implements Map.get and related methods.
*
* @param key the key * @return the node, or null if none 
*/ 
final Node getNode(Object key) {
    Node[] tab; Node first, e; int n, hash; K k;
}
public V get(Object key) {
	Node<K,V> e;
	return (e = getNode(hash(key), key)) == null ? null : e.value;
}

final Node<K,V> getNode(int hash, Object key) {
	Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
	if ((tab = table) != null && (n = tab.length) > 0 &&
		(first = tab[(n - 1) & hash]) != null) {
		if (first.hash == hash && // always check first node
			((k = first.key) == key || (key != null && key.equals(k))))
			return first;
		if ((e = first.next) != null) {
			if (first instanceof TreeNode)
				return ((TreeNode<K,V>)first).getTreeNode(hash, key);
			do {
				if (e.hash == hash &&((k = e.key) == key || (key != null && key.equals(k))))
					return e;
			} while ((e = e.next) != null);
		}
	}
	return null;
}

Breaking down this code stub into its core composition, we get the following:

As the HashMap was putting the data in Hashtable in the same way, it will retrieve it. We will pass the Key Object then HashMap will:

Calculate the hash value and determine its index
Use an equal method while iterating over LinkedList to find the same Key
Once it finds the same Key, it will return the value
Otherwise, it will return null

Now that we have understood both get and put implementations let's try to take our example and understand its flow. We write a small java program for storing data in HashMap:

import java.util.*;
class CyberSec {
    public static void main (String [] args) { 
        HashMap m = new HashMap (); 
        m.put(14,"DarkRelay"); 
        m.put(26,"Security");
        m.put(34,"Labs"); 
        m.put(15,"is"); 
        m.put(29,"the");
        m.put(16,"best!");  
        System.out.println(m.get(14)); // DarkRelay
        }
}

For simplicity, let's assume that the hash function used was modulo of 5 i.e. h(key) = key%5. Let's understand how each of the entries is inserted into the HashMap:

Let's calculate the hash value of (14,"DarkRelay") , it is 14%5 = 4. Next, we check if an entry exists in the "4th index". Since it doesn't, we will add {14, "DarkRelay", 4, null} in the 4th bucket.

Next, the entry is (26,"Security") the index for which 26%5 = 1. Now, check if an entry exists in the "1st index". Since it doesn't, we will add {26, "Security", 1, null} in the 1st bucket.

Up next is (34,"Labs") the index comes out to be 34%5 = 4 which means the hashCode 4 will be a case of hash collision. So, the HashMap will iterate the 4th bucket and hit the already added {14, "DarkRelay", 4, null} Entry and once it finds a "next Entry" as null, it adds the current Entry {34, "Labs", 4, null}, making the 4th bucket look like: '{14, "DarkRelay", 4, Reference to next Entry}, {34, "Labs", 4, null}'

For (15,"is") the index is 15%5 = 0. Since nothing exists in the "0th index", {15, "is", 0, null} is added, a case similar to the second element.

Followed by (29,"the") whose value comes out to be 29%5 = 4. On checking the 4th index we see 2 Entry already exists in the bucket so the HashMap iterates until it finds a null value for "next Entry" updating that with current entry {29, "the", 4, null} making the 4th bucket look like: '{14, "DarkRelay", 4, Reference to next Entry}, {34, "Labs", 4, Reference to next Entry}, {29, "the", 4, null}'

At last, we have (16,"best!"), giving us a hash value of 16%5 = 1. A case similar to case 3 where the bucket updated is the 1st one. Making the 1st bucket Entry look like: '{26, "Security", 1, Reference to next Entry}, {16, 'Best",1, null}'

Let's explore the m.get(14) method here; the hash for 14 is 14%5 = 4. The HashMap will iterate over the 4th bucket and keep checking the Key's value using the equal () method until it finds a match at the First Entry '{14, "DarkRelay", 4, Reference to next Entry}' giving us a result of "DarkRelay".

Java 8 improvements for HashMap

HashMap replaces LinkedList with a self-balancing tree on crossing a certain threshold, also known as "Treefy_Threshold". LinkedList of Entries converted to tree node; that way, time complexity for each bucket will reduce from O(n) to O(log(n))

https://www.darkrelay.com/courses