Open-Source Internship opportunity by OpenGenus for programmers. Apply now.
In this article, we will understand in depth the designing of a data structure that supports "insert value", "delete value" and "get random" operations where these tasks are performed in constant time O(1).
Table of contents
- Problem statement + naive solutions
- Hash function and Hash table
- Hashing algorithm
- Collisions
- New Data Structure Design
- Creating the data structure
- Output
Problem statement + naive solutions
This problem requires us to create our own data structure that performs "insert value", "delete value" and "get random" operations in the quickest way possible. Let us explore the options that we can consider for this.
Suppose we take a linked list and use it for the above mentioned operations, it takes time O(n) to perform the tasks ( where n is the size of the linked list); as the whole list has to be traversed to perform any ony of those tasks.
Now let us consider a binary search tree. With this data structure, all tasks are performed in time O(log2n).Even though it considerably reduces the performance time when compared to linked lists, it still isn't fast enough.
This is why hash tables are the best fit for our problem. All the tasks are performed in time O(1) using hash tables, which is way faster than others and thus provides an optimal solution.
Before we dive in to our topic, we need to understand about hash function and hash tables.
Hash function and Hash table
Hash table is a data structure which allows us to store, delete and search values. In the hash table, the data is stored as 'keys' and 'values'. This data structure is the most efficient option to perform the above mentioned operations as it completes them in the time O(1).
In the hashing technique, the hash function is used to determine the address at which the particular value is to be stored.
Hashing algorithm
Certain calculations are applied to the key so that it can be transformed into an address. Some of the commonly used hashing algorithms are:
-
For numeric keys, divide the key by the number of available addresses (say n), and take the remainder.
i.e. address = key % n. -
For alphanumeric keys, divide the sum of ASCII codes in the key by the number of available addresses (n) and take the remainder.
-
The folding method divides the key into equal parts and then adds the parts together; depending on the size of the table, then it may be divided with some constant number and the remainder is considered.
While the above mentioned are some most used hashing algorithms, the hash function or the algorithm to be used entirely depends on the developer's choice and needs.
Collisions
When two keys have the same value after going through the hash function, it means that there are two values competing for the same index position. This problem is known as collision. There are two ways to handle collisions:
- Open Hashing or Closed addressing
- Closed hashing or open addressing
New Data Structure Design
The hash table is implemented using a dictionary and a list. The list is used here to make the insertion and deletion process easier. The time complexity for this program is O(1) and space complexity is O(n) where n is the total number of elements in the data structure.
The memory requirement for this approch is double the number of elements stored in the data structure, since it is stored twice; in the list and the dictionary. So if the total number of elements is N, the memory requirement would be 2N.
The steps to inmplement this approach are:
- Create a class datastruct.
- Initialize the list and dictionary as array and hashmap respectively for the ease of understanding.
- Define a method insert that facilitates insertion of a value in our data structure. The value is inserted into both the array as well as hashmap.
- A function delete is defined to delete an element from our data structure. The data to be deleted is first deleted from the hashmap, and then it's swapped with the last element in the array. the last element of the array is then deleted and the values are updated to the hashmap.
- A function getrandom is defined to generate random elements from the data structure.
- The printmap function is defined to print all the elements in the data structure.
- The values to be inserted, deleted, getting a random element and printing the data structure are done by specifying the task to be performed in the driver code.
Creating the Data Structure
Importing the required modules
First, we import the random module availabe in python; which contains the functions that help us get random values.
import random
Creating a class to define our data structure
Then a class datastruct is created under which all the neceressary functions are defined. The first is the magic function __ init __ (), under which all the required variables are initialized.
class datastruct:
def __init__(self):
self.array=[]
self.hashmap={}
Here, we use a resizable array or list for inserting values into the hashmap so that the process can be performed in time O(1). And the hashmap is initialized as a dictionary to facilitate storing of values as keys and values.
Inserting elements
We define the following function to insert an element into our data structure.
def insert(self,val):
if val in self.hashmap:
return
index=len(self.array)
self.array.append(val)
self.hashmap[val]=index
The above function takes the value to be inserted as its argument. It checks if the value is already present in the hash table or map; if that is the case, the function is terminated as only unique elements are inserted. Else, the index is taken to be the length of the array and the new data is appended into it. Insertion into the hash map is then performed where the key is the element to be inserted and the value is the index.
For example, if we are inserting 8 as our first element, then it will be stored in hashmap as {8:0}.
Deleting an element
For the purpose of deleting an element, we define two functions:
- swap : To swap positions of the last element and the element to be deleted.
- delete: To delete the element from the hashmap and the last element of the array after swapping.
def swap(self,val):
index=self.hashmap.get(val)
if index is None:
return
else:
size=len(self.array)
temp=self.array[index]
self.array[index]=self.array[size-1]
self.array[size-1]=temp
def delete(self,val):
index=self.hashmap.get(val)
size=len(self.array)
last=self.array[size-1]
if index is None:
return
del self.hashmap[val]
self.swap(val)
del self.array[-1]
self.hashmap[last]=index
Let us suppose our hashmap is {21:0, 54:1, 1:2, 36:3}. The array will look like [21, 54, 1, 36]. If the element 54 is to be deleted, instead of shifting the elements 1 and 36 to the left by one position after 54 is deleted, the last element 36 could just occupy the position of 54. Here is where the swap function is used. This way we can perform the task in time O(1) which otherwise takes longer.
Getting random values
The module random in python contains a function randrange() which generates a random number withnin the specified range. The below is the function to get a random element from the hashmap.
def getrandom(self):
index=random.randrange(0,len(self.array))
return self.array[index]
This function uses randrange() to generate a random index value within the length of the array and returns the data stored in that index in the array.
Printing the elements
The following function is defined to print the hashmap at any particular point when it's called.
def printmap(self):
print("Elements in the data structure:")
print (self.hashmap)
Driver code
This the the main code where the elements are inserted, deleted and random elements from the data structure are displayed as specified in the code.
myDS=datastruct()
myDS.insert(1)
myDS.insert(25)
myDS.insert(84)
myDS.insert(43)
myDS.insert(37)
myDS.printmap()
myDS.delete(25)
print("After Deletion,")
myDS.printmap()
print("Random element:")
myDS.getrandom()
Output
Elements in the data structure:
{1: 0, 25: 1, 84: 2, 43: 3, 37: 4}
After Deletion,
Elements in the data structure:
{1: 0, 84: 2, 43: 3, 37: 1}
Random element:
25
This is the output for the inputs given in the driver code.
By the end of this article at OpenGenus, you will have a clear understanding on creating a data structure that performs the specified tasks in time O(1).