unordered_set in C++ STL


In this article we are going to discuss about the unordered_set container class of the C++ Standard Template Library. unordered_set is one of the most useful containers offered by the STL. It provides search,insert,delete in O(1) on average. We'll talk more about the worst case complexity later.

Declaring a set

unordered_set<data_type> any_name_you_want 

We can even pass self defined objects in the template above.

A peek under the hood

You may be thinking how this data structure can offer search,insert,delete in O(1) time ?

Let's talk about how an unordered_set works internally. unordered_set uses a Hashtable to store our data in form of a key.In layman's terms a Hashtable is just an array. We use a hash function to convert our data into a suitable value which can be used as an index in the array.For example for integers we can use modulo by a prime number and for string we can use weighted sum modulo m where m is a prime number. Please note that the above method does not guarantees unique values and therefore two values can match. This is called as a collision . unordered_set handles collision by using a method called chaining in which we use an array or linked list to store collided keys. Thus in the worst case all keys are matched with the same hash value thus giving worst case complexity as O(n).

Some useful Methods

  • insert

We can use the insert method to add a key to our set. Remember set only contains unique values in case you pass a value which is already present it simply overwrites that key.

unordered_set<int> hashSet;
hashSet.insert(1);
hashSet.insert(5);
hashSet.insert(4);

Time complexity - O(1) on average O(n) in worst case

  • find

To find whether a key is present in our set or not we can use the find method. find method returns the iterator to the key we want. If the key is present it returns the iterator pointing to the key otherwise returns the iterator pointing to the key after the last key(which does not exist).
Above point must be noted carefully. You should use if else block, because if the key is not present it will return hashSet.end() which can show unusual behaviour when pointed to!

if(hashSet.find(5)!= hashSet.end()){
    cout << "Key present" << endl;
}
else{
    cout << "Key not present " << endl;
}

Time complexity - O(1) on average O(n) in worst case

  • count

If you are uncomfortable with pointers and iterators you can use count method which is an alternative to the find method. As the name suggests it returns the count of the key in the set. Since set does not allow duplicates it returns 1 when a key is present otherwise returns 0.

if(hashSet.count(9)){
    cout << "Key found" << endl;
}
else{
    cout << "Key not found" << endl;
}

Time complexity - O(1) on average O(n) in worst case

  • erase

The erase method is used to delete keys from the set. We can delete single as well as multiple keys at the same time. We can directly pass the key to be deleted or we can pass the iterator to that key. For deleting multiple keys we have to pass start iterator and end iterator.

hashSet.insert(11);
hashSet.insert(13);
hashSet.insert(-3);
hashSet.erase(4);
auto start = hashSet.find(11);
auto end = hashSet.find(-3);
hashSet.erase(start,end);

Time complexity - O(1) on average O(n) in worst case

  • size

The size method returns the number of keys in our set.

int s = hashSet.size();
cout << "Number of keys in the set : " << s << endl;

Time complexity - O(1)

  • clear & empty

clear method is used to delete all the keys at once from the container.It does take any parameter. empty method returns true if set is empty otherwise returns false.

hashSet.clear();
if(hashSet.empty()){
 cout << "Set is empty" << endl;
}
else{
 cout << "Set is not empty" << endl;
}

Time complexity for clear - O(n) where n is the number of keys
Time complexity for empty- O(1)

Other methods

  • begin - returns the iterator to the first key of the set
  • end - returns the iterator to the key beyond the last key of the set
  • hash_function - if you're curious about the internal hash function working you can use this method and pass some data to see the hash value
  • load_factor - you can also set the load factor of the hash table, load factor is the ratio of the slots available in the hashtable array to the number of keys inserted. By default it is 75%.For more information please refer to the hashing article on OpenGenus
  • we can also iterate through the set using the begin and end iterators. Since the set is unordered the results can vary each time.
for(auto x : hashSet){
    cout << x << " ";
}
cout << endl;

Wrapping it Up

unordered_set is fastest among all other data structures in search,insert and delete on average. It has some crucial practical usecases in company interviews as well as in competitive programming. For example it is used in questions like :

  • find duplicate or distinct elements in an array
  • find a pair in an whose sum is equal to the given sum
  • find majority elements
  • find pair with given sum in a BST

The list goes on. From the last example we can see that other data structures also make use of unordered_set. Thus this shows the importance of unordered_set in data structures and algorithms.

With this article at OpenGenus, you must have the complete knowledge of unordered_set in C++ STL. Enjoy.