Double Hashing

Double hashing is a computer programming technique used in conjunction with open addressing in hash tables to resolve hash collisions, by using a secondary hash of the key as an offset when a collision occurs. Double hashing uses the idea of applying a second hash function to key when a collision occurs.

Double Hashing Procedure

Double hashing can be done using:- (hash₁(key) + i * hash₂(key)) % Table_size; where hash₁() and hash₂() are hash functions and Table_size is size of hash talbe. This process is repeated by incrementing i when collision occurs.

Procedure

Take the key you want to store on the hash-table.
Apply the first hash function h₁(key) over your key to get the location to store the key.
If the location is empty, place the key on that location.
If the location is already filled, apply the second hash function h₂(key) in combinatioin with the first hash function h₁(key) to get the new location for the key.

Example for Double hashing

Implementation in Python

def double_hashing(keys, hashtable_size, double_hash_value):
    hashtable_list = [None] * hashtable_size
    for i in range(len(keys)):
        hashkey = keys[i] % hashtable_size
        if hashtable_list[hashkey] is None:
            hashtable_list[hashkey] = keys[i]
        else:
            new_hashkey = hashkey
            while hashtable_list[new_hashkey] is not None:
                steps = double_hash_value - (keys[i] % double_hash_value)
                new_hashkey = (new_hashkey + steps) % hashtable_size 
            hashtable_list[new_hashkey] = keys[i]
    return hashtable_list 


values = [26, 54, 94, 17, 31, 77, 44, 51]
print( double_hashing(values, 13, 5) )

Output
[26, None, 54, 94, 17, 31, 44, 51, None, None, None, None, 77]

Searching a key

The algorithm of searching a key in the hash table loops through the initial hash value while the busy slots not equal to zero in the hash table by incrementing the iterative variable by 1 and modulo by size. It checks if the key is the key we are looking for by returning true if the key is found else returning false if the key doesn't exist in the hash table. The **contains method** in the pseudo-code for searching and deletion in a hash table is the search method which searches for a key's existence in the hash table.

Time complexity for searching a key in hash table

The searching algorithm takes constant time as each lookup in the hash table costs O(1) time. In the worst case configuration, when the key is not in the array we iterate through the whole array which costs linear time. Thus, the worst case time complexity for searching a key in hash table is O(n).

Deletion of a key

Open address table doesn't use any linked lists type of data structure. We can imagine, that there are implicit lists, which can contain keys with dfferent hash values. Each busy cell in the array is the beginning of the list in open address table. Busy cell in the array can be contained in some lists. It is the main hurdle to delete the elements from these implicit lists, because we cannot mark the deleted slot as empty, as it will demolish the lists. The classical solution for saving these lists in corrct state is to mark deleted slot by special value. We can add new key into this cell, but cannot interrupt a search, if we reach such slot. The discadvantage of this method is lists can be too long which we obtain after many deletions, unnecessary space allocation to the deleted slot, as such deletion process doesn't reduce the size of the list. In the worst case we can have actually empty table with many slots deleted and marked, but search may take a lot of time, because we have to examine all the deleted(marked) slots in the probe sequence.

There is an another method to delete keys from open address hash table. After deleting the key, some gaps appear in the lists, which contain the deleted slot. Therefore, we will compress lists, which contain the deleted slot, so these lists remain in correct state. Deleting process partitions into two parts:-

Search the deleted key and mark the slot as empty.
Compress all the lists, which included the deleted key.

After marking the deleted slot as the empty slot, we continue to pass the probe sequence until we reach a free slot. We keep the index of empty slot, initially it equals index of deleted slot. If the current element in the sequence can be moved to a free slot, we move the element, and change the variable free, as the current slot becoes free. If the free slot and the current element can be in the same list, that means when we were adding the current element, we could add it in this free slot, we can move the current element but this slot was not free and we continued searching for free cell. To make this decision, we use the array variable named ex and the variable named off, where off is keeping the offset from the free slot to the current element and (ex[i]-1) equals the amount of the busy slots, which we skipped during probing, while we were adding a key. so the condition to determine to move the key or not, is ex[i] should be greater than off value.

Time complexity for deleting a key

The deletion algorithm first searches the key to be deleted taking constant time, deleting it and compresses the list. Since deleting the key and compressing the list takes constant time, the time complexity of the deletion algorithm depends on the searching time with linear worst case complexity. Thus, time complexity for the worst-case configuration is O(n).

Pseudo-code for searching and deletion of a key

class HashTable {
    final int SIZE  = 1000000; // size of the talbe
    long[] a = new long[SIZE];
    int[] ex = new int[SIZE];
    // add new key
    boolean add(long key) {
        int i = h(key), j = 1;
        for (; ex[i] != 0; i = (i+1)%SIZE, j++) {
            if (a[i] == key)
                return false;
        }
        ex[i] = j;
        a[i] = key;
        return true;
    }
    // Search method to search for a key in the table.
    // return true, if key is contained in the table
    boolean contains(long key) {
        for (int i = h(key); ex[i] != 0; i = (i+1)%SIZE) {
            if (a[i] == key)
                return true;
        }
        return false;
    }
    // Deleting a key from the table:-
    // removes key from the table
    boolean remove(long key) {
        for (int i = h(key); ex[i] != 0; i = (i+1)%SIZE) {
            if (a[i] == key) {
                ex[i] = 0;
                compress(i);
                return true;
            }
        }
    }
    // compresses lists
    void compress(int free) {
        int i = (free + i) % SIZE, off = 1;
        for (; ex[i] != 0; i = (i+1)%SIZE, off++) {
            if (ex[i] > off) {
                // move current element
                a[free] = a[i];
                ex[free] = ex[i] - off;
                // mark current slot as free
                ex[i] = 0;
                off = 0;
                free = i;
            }
        }
    }
    int h(long key) {
        return (int) Math.abs(key % SIZE);
    }
}

Advantages of Double Hashing

The advantage of Double hashing is that it is one of the best form of probing, producing a uniform distribution of records throughout a hash table.
This technique does not yield any clusters.
It is one of effective method for resolving collisions.

Disadvantage of Double Hashing

Double hashing has poor cache performance but no clustering.
Double hashing requires more computation time as two hash functions need to be computed.

Compared to other hash methods

In linear probing, when collision occurs, we linearly probe for the next bucket and we keep probing until an empty bucket is found. In quadratic probing, we probe for the i^{2 th} bucket in i^th iteration and we keep probing until an empty bucket is found. In double hashing, we use another hash function hash₂(x) and look for i * hash₂(x) bucket in i^th iteration. It requires more computation time as two hash functions need to be computed.

This concludes that linear probing has the best cache performance but suffers from clustering, quadratic probing lies between the two in terms of cache performance and clustering while double caching has poor cache performance but no clustering.

Double Hashing

Algorithms Hashing

Hashing

Hash Function

Double Hashing

Double Hashing Procedure

Example for Double hashing

Implementation in Python

Searching a key

Deletion of a key

Pseudo-code for searching and deletion of a key

Advantages of Double Hashing

Disadvantage of Double Hashing

Compared to other hash methods

LLVM: IR, Assembly, SSA

Download webpage in Java [5 methods]