×

Search anything:

Open Addressing - a collision handling method in Hash Tables

Internship at OpenGenus

Get this book -> Problems on Array: For Interviews and Competitive Programming

In this article, we have explored Open Addressing which is a collision handling method in Hash Tables. We have explored the 3 different types of Open Addressing as well. It is also known as Closed Hashing.

Table of Contents

  • Introduction to Hashing
  • Handling Collision
  • Open Addressing
  • Linear Probing
  • Quadratic Probing
  • Double Hashing
  • Comparison of Three Collision Handling Techniques
  • Advantages of Open Addressing

Introduction to Hashing

Hashing:
A method used for storing, finding, and sorting data in a database or array. Using this process, values are paired with keys to be stored in the hash tables in an array format. For example, in a school students are assigned unique roll numbers or registration numbers to store each student's information. These keys are utilized to insert, remove, and locate data very quickly.

Hashing Function:
Hashing turns large data into small hash values by applying hashing functions. These hash values are then stored in the hash table. Requirements for a good hash function:

  1. Collision-free, meaning two values should not be stored in the same index
  2. Evenly assigns keys and values in the hash table
  3. Makes calculations easy

There are various methods applied to find indexed to assign values. These methods include:

  • remainder of value divided by table size n; value mod n
  • square of the value and insert to that index, or picking the middle digit of a large value for the index

However, such methods may collide with assigning values. For example, any two values can have the same index number but cannot be stored in the same index.

Handling Collision

As each index in a hash table can store only one value, these conflicts need to be resolved. There are several techniques to handle these collisions and find other positions for the conflicting values in order to maintain the efficient performance of the hash table.

Collision Resolution Techniques:

  1. Open Hashing (Chaining)
  2. Closed Hashing ( Open Addressing)

Open Hashing or Chaining method creates an external chain of values that has the same index. The chain is generated from that position as a linked list. Collision is resolved by storing multiple values together in that same index.

Closed Hashing or Open Addressing tries to utilize the empty indexes in a hash table for handling collision. In this method, the size of the hash table needs to be larger than the number of keys for storing all the elements.

This article explains the function of closed hashing or open addressing technique, its approaches, and advantages.

Open Addressing

Open addressing or closed hashing is the second most used method to resolve collision. This method aims to keep all the elements in the same table and tries to find empty slots for values. Closed hashing refers to the fact that the values always stay stored in the hash table. Open addressing is named because the locations for the values are not fixed and can be addressed to an empty slot if a collision happens.

This method resolves collisions by probing or searching through the hash table for indexes that are available for storing elements. Unlike open hashing or chaining, open addressing stores one value in each index. The basic functions of this method are to add, remove or find an element.

It has different approaches for these functions but the well-knowns are:

  1. Linear Probing
  2. Quadratic Probing
  3. Double Hashing

Linear Probing

It is considered the easiest collision resolution technique. If a collision occurs, this method performs a sequential search on the hash table. It finds the next empty slot and inserts the value in that slot.

If the computed index is hash(x) and the table size is n,
index = hash(x) % n

In the case of a collision,
index = (hash(x) + i) % n

Here, i is the number of times a collision occurs or the number of times a value needs to be rehashed. It means, after calculating the index if a collision occurs then i = 1; if the cell is not available the first time then i = 2, and so on.

Problem:
Though linear probing is easily implemented, it faces some clustering problems such as Primary Clustering. It implies that linearly searching through the hash table to find a vacant position or an element can be difficult if the table contains a large number of data.

Quadratic Probing

A quadratic probing approach is taken to resolve the primary clustering problem that occurs in the linear probing method. This technique performs a quadratic or square-shaped search in the occurrence of a collision. Therefore, a larger gap is created during the search process of finding an element or an empty slot.

Again, if the computed index is hash(x) and the table size is n,
index = hash(x) % n

In the case of a collision,
index = (hash(x) + i^2) % n

Here as well, i indicates the number of attempts taken to find the slot or value. If the calculated index is found to be not available, i = 1 and after the second attempt fails then, i = 2. In this method, the probing is no longer sequential rather the intervals become greater than linear probing.

Problem:
Even in this probing technique, there can be a clustering problem. This is called Secondary Clustering. This cluster forms in a similar way as primary clustering. If the multiple values point to the same index, they will follow the exact same process as the previous one. Therefore, it can be once again difficult to find a location suitable for a value.
As a matter of fact, it cannot guarantee an empty position for a value when more than half of the hash table is full.

Double Hashing:

This method is considered one of the best approaches for hashing. Clustering can be avoided in this technique by using two hash functions. These hash functions are also used for handling collisions. The first function calculates the primary location for a value and the second function calculates the probing gap to find an alternative location for a value.

If computed index is hash1(x) and size of the interval is hash2(x),
hash1(x) = x % n
hash2(x) = (n-1) - (x % n)
Therefore,
index = (hash1(x) + (i * hash2(x))) % n

Here, x is the value that needs an index, n is the table size which should be a prime number, and i is the number of times a collision occurred. The (n-1) in the hash2(x) function can be any number smaller than the table size. The probing here happens rather randomly, maintaining a consistent manner throughout the hash table.

Comparison of Three Collision Handling Techniques

  • Linear probing is easier to compute than the other methods. The calculating formula is rather simpler than those of quadratic probing and double hashing.
  • Due to the complex formula, double hashing takes longer to compute in comparison to linear probing and quadratic probing.
  • Double hashing resolves the clustering problems faced in linear and quadratic probing.
  • In the case of cache performance, linear probing has the best performance. As double hashing requires the computation of two hash functions, it has poor cache performance.

Advantages of Open Addressing

  • Open addressing improves cache speed because all the data is stored within the hash table
  • It properly uses its empty indexes and improves memory efficiency
  • As there is no linked list or pointer involved, the performance is faster than chaining or open hashing.

With this article at OpenGenus, you must have the complete idea of Closed Hashing or Open Addressing.

Open Addressing - a collision handling method in Hash Tables
Share this