Open-Source Internship opportunity by OpenGenus for programmers. Apply now.
In this article, we will discuss about Double Hashing, a technique to resolve hash collisions in hash tables along with Time Complexity analysis of Double Hashing.
What we will see,
- Hashing
- Hash function
- Double Hashing
- Double Hashing Procedure
- Explained through an example
- Implementation in python
- Searching a key in open address hash table
- Deletion from open address hash table
- Pseudo-code for searching and deleting a key
- Advantages
- Disadvantages
- Compared to other hash methods
- References
Hashing
Hashing is an improvement over Direct Access Table. The idea is to use a hash function that converts a given key number to a smaller number and uses the small number as the index in a table called a hash table.Hash Function
Hash function is a function that converts a given big number to a small practical integer value. The mapped integer value is used as an index in the hash table. In simple terms, a hash function maps a large string or big number to a small integer that can be used as an index in the hash table.Double Hashing
Double hashing is a computer programming technique used in conjunction with open addressing in hash tables to resolve hash collisions, by using a secondary hash of the key as an offset when a collision occurs. Double hashing uses the idea of applying a second hash function to key when a collision occurs.
Double Hashing Procedure
Double hashing can be done using:- (hash1(key) + i * hash2(key)) % Table_size; where hash1() and hash2() are hash functions and Table_size is size of hash talbe. This process is repeated by incrementing i when collision occurs.
Procedure
- Take the key you want to store on the hash-table.
- Apply the first hash function h1(key) over your key to get the location to store the key.
- If the location is empty, place the key on that location.
- If the location is already filled, apply the second hash function h2(key) in combinatioin with the first hash function h1(key) to get the new location for the key.
Example for Double hashing
Implementation in Python
def double_hashing(keys, hashtable_size, double_hash_value):
hashtable_list = [None] * hashtable_size
for i in range(len(keys)):
hashkey = keys[i] % hashtable_size
if hashtable_list[hashkey] is None:
hashtable_list[hashkey] = keys[i]
else:
new_hashkey = hashkey
while hashtable_list[new_hashkey] is not None:
steps = double_hash_value - (keys[i] % double_hash_value)
new_hashkey = (new_hashkey + steps) % hashtable_size
hashtable_list[new_hashkey] = keys[i]
return hashtable_list
values = [26, 54, 94, 17, 31, 77, 44, 51]
print( double_hashing(values, 13, 5) )
Output
[26, None, 54, 94, 17, 31, 44, 51, None, None, None, None, 77]
Searching a key
The algorithm of searching a key in the hash table loops through the initial hash value while the busy slots not equal to zero in the hash table by incrementing the iterative variable by 1 and modulo by size. It checks if the key is the key we are looking for by returning true if the key is found else returning false if the key doesn't exist in the hash table. The **contains method** in the pseudo-code for searching and deletion in a hash table is the search method which searches for a key's existence in the hash table.
Time complexity for searching a key in hash table
The searching algorithm takes constant time as each lookup in the hash table costs O(1) time. In the worst case configuration, when the key is not in the array we iterate through the whole array which costs linear time. Thus, the worst case time complexity for searching a key in hash table is O(n).
Deletion of a key
Open address table doesn't use any linked lists type of data structure. We can imagine, that there are implicit lists, which can contain keys with dfferent hash values. Each busy cell in the array is the beginning of the list in open address table. Busy cell in the array can be contained in some lists. It is the main hurdle to delete the elements from these implicit lists, because we cannot mark the deleted slot as empty, as it will demolish the lists. The classical solution for saving these lists in corrct state is to mark deleted slot by special value. We can add new key into this cell, but cannot interrupt a search, if we reach such slot. The discadvantage of this method is lists can be too long which we obtain after many deletions, unnecessary space allocation to the deleted slot, as such deletion process doesn't reduce the size of the list. In the worst case we can have actually empty table with many slots deleted and marked, but search may take a lot of time, because we have to examine all the deleted(marked) slots in the probe sequence.
There is an another method to delete keys from open address hash table. After deleting the key, some gaps appear in the lists, which contain the deleted slot. Therefore, we will compress lists, which contain the deleted slot, so these lists remain in correct state. Deleting process partitions into two parts:-
- Search the deleted key and mark the slot as empty.
- Compress all the lists, which included the deleted key.
After marking the deleted slot as the empty slot, we continue to pass the probe sequence until we reach a free slot. We keep the index of empty slot, initially it equals index of deleted slot. If the current element in the sequence can be moved to a free slot, we move the element, and change the variable free, as the current slot becoes free. If the free slot and the current element can be in the same list, that means when we were adding the current element, we could add it in this free slot, we can move the current element but this slot was not free and we continued searching for free cell. To make this decision, we use the array variable named ex and the variable named off, where off is keeping the offset from the free slot to the current element and (ex[i]-1) equals the amount of the busy slots, which we skipped during probing, while we were adding a key. so the condition to determine to move the key or not, is ex[i] should be greater than off value.
Time complexity for deleting a key
The deletion algorithm first searches the key to be deleted taking constant time, deleting it and compresses the list. Since deleting the key and compressing the list takes constant time, the time complexity of the deletion algorithm depends on the searching time with linear worst case complexity. Thus, time complexity for the worst-case configuration is O(n).
Pseudo-code for searching and deletion of a key
class HashTable {
final int SIZE = 1000000; // size of the talbe
long[] a = new long[SIZE];
int[] ex = new int[SIZE];
// add new key
boolean add(long key) {
int i = h(key), j = 1;
for (; ex[i] != 0; i = (i+1)%SIZE, j++) {
if (a[i] == key)
return false;
}
ex[i] = j;
a[i] = key;
return true;
}
// Search method to search for a key in the table.
// return true, if key is contained in the table
boolean contains(long key) {
for (int i = h(key); ex[i] != 0; i = (i+1)%SIZE) {
if (a[i] == key)
return true;
}
return false;
}
// Deleting a key from the table:-
// removes key from the table
boolean remove(long key) {
for (int i = h(key); ex[i] != 0; i = (i+1)%SIZE) {
if (a[i] == key) {
ex[i] = 0;
compress(i);
return true;
}
}
}
// compresses lists
void compress(int free) {
int i = (free + i) % SIZE, off = 1;
for (; ex[i] != 0; i = (i+1)%SIZE, off++) {
if (ex[i] > off) {
// move current element
a[free] = a[i];
ex[free] = ex[i] - off;
// mark current slot as free
ex[i] = 0;
off = 0;
free = i;
}
}
}
int h(long key) {
return (int) Math.abs(key % SIZE);
}
}
Advantages of Double Hashing
- The advantage of Double hashing is that it is one of the best form of probing, producing a uniform distribution of records throughout a hash table.
- This technique does not yield any clusters.
- It is one of effective method for resolving collisions.
Disadvantage of Double Hashing
- Double hashing has poor cache performance but no clustering.
- Double hashing requires more computation time as two hash functions need to be computed.
Compared to other hash methods
In linear probing, when collision occurs, we linearly probe for the next bucket and we keep probing until an empty bucket is found. In quadratic probing, we probe for the i2 th bucket in ith iteration and we keep probing until an empty bucket is found. In double hashing, we use another hash function hash2(x) and look for i * hash2(x) bucket in ith iteration. It requires more computation time as two hash functions need to be computed.
This concludes that linear probing has the best cache performance but suffers from clustering, quadratic probing lies between the two in terms of cache performance and clustering while double caching has poor cache performance but no clustering.