Disjoint set/ Union Find in C++ using OOP and Template

In this article at OpenGenus, we are going to study and explore Disjoint Set/Union data structure. We will see their definition, usage and how we are going to apply them to solve various complex problems in C++ language.

TABLE OF CONTENTS

  1. Introduction
  2. Disjoint set using rank
  3. Checking elements in same component
  4. Data structures needed
  5. Code with explanation
  6. Finding the parent
  7. Path compression
  8. Union operation
  9. Example with complete code
  10. Conclusion

INTRODUCTION

Two sets are called disjoint sets if they don’t have any element in common, the intersection of sets is a null set. We can perform a given set of operations on these data structures:

  1. We can add new sets to them.
  2. Using Union operation we can merge two disjoint sets into a single disjoint set.
  3. Using find operation, we can find the ultimate parent or representative element of the disjoint set.
  4. We can also check if two sets are disjoint or not.

It is a very important data structure which we can use to solve dynamic graphs problem. Also with respect to graphs it signficantly reduces the time complexity(eventually performs in constant time) when finding whether two nodes are part of the same component or not.

In the next few sections we would explore all of these with help of cpp code.
Implementation of Disjoint set data structure can be done in two ways :

  1. With help of Rank
  2. With help of Size

Here we will see implementation with help of rank.

Disjoint Set Using Rank

Rank here can be defined as the position of the elements in our set. One may confuse it with the height part of graph but we would see how it is different.

Suppose we are given some pair of elements :

1 2
2 3
4 5
6 7
5 6
3 7

Now we have to join to form a component. In between queries like whether 3 and 7 belong to same component may be asked. Now We can see these elements as nodes of a graph and we have to join them. These types of graph are termed as Dynamic Graphs as they keep changing.

Inititally we assume that all the elements as individual components. We tend to connect them as given in the sequence. As elements tend to attach to each other they form sets.

CHECKING ELEMENTS IN SAME COMPONENT

To check whether two elements are in the same componenets or not, we keep a representative of every set. Every element of that set would know representative of its set. Now if two elements have same representative that means they belong to the same set othewise they belong to different sets. This is the basic idea behind Union and finding elements of same component.

DATA STRUCTURES NEEDED

  1. ARRAY : We need two arrays here. One would be used to store the parent of our element. Other array would be used to store rank of the elements.

Here we would both these arrays with help of vectors.

CODE WITH EXPLANATION

We would be making a class named Disjoint_set. The class would represent our disjoint set data structure. For this class the member variables defined are two vectors named rank, parent. In this class we have made a constructor which would initialise both the vectors. We initialise rank array as 0 becuase initially every element is considered as a single element not joined with element. Also initially every element is considered as an indivual set, and the representative of the set is the element itself. As we join different sets the representative is changed accordingly. For that the union rules listed below the code have to be followed.

The methods of the class would be:
1.Union_rank : It is used for the union of two sets.
2.findParent : To find the parent/representative of the set.

Both these are explained in the sections below.
Here template T is used so that the code can be reused for different data types.

template <typename T>
class Disjoint_set
{
	vector<T> rank,parent; // declaring both vectors.
public:

	Disjoint_set(T n)
	{
		rank.resize(n + 1, 0); // resizing them to store n+1 elements
		for (auto i = 0; i <= n; i++)
		{
			parent.push_back(i); // initially parent of every element is element himself.
		}
	}
  };

Now for union we would be :

  1. Compare ranks of the ultimate parents of the two given elements.
  2. Attach the smaller rank element to the larger rank element.
  3. If two elements have same rank then attach anyone to anyone, here increase the rank by 1.

Ultimate parent is the representative of the set, we would store it in the parent vector for every element.

FINDING THE PARENT

The parent of the element would be the element to which it is attached.
Now for union we have to find the ultimate parent that is the representative of the set. The ultimate parent of the set would be the element which has its parent as itself. So we have to do recursive calls to find the ultimate parent of any element.

	T findParent(int n)
   {
   	if (parent[n] == n) // base case condition.
   	{
   		return n;
   	}

   	return findParent(parent[n]);
   }

Now to further reduce our time we can use a techinique called memoziation.

PATH COMPRESSION

The memoziation technique is effectively called path compression in this case. The basic idea is to connect every element to its ultimate parent. Then it would be easy to perform the union operation. In this case we do not have to do recursive calls every time but if the answer is present already, then it would return it otherwise recursion would be performed and answer will be stored accordingly. In simple words we are going to connect every element to the representative of the set. So when performing union we can easily access them.

	int findParent(int n)
	{
		if (parent[n] == n)
		{
			return n;
		}
        
		return parent[n] = findParent(parent[n]); // here the answer is stored here so that the answer can be accessed afterwards.
	}

UNION OPERATION

Now we would be performing the union operation. The rules of performing union have been mentioned earlier.

void Union_rank(T n1, T n2)
	{
		T ulp_u = findParent(n1); //finding the ultimate parent
		T ulp_v = findParent(n2);

		if (ulp_u == ulp_v) // if parent same then belong to same set so return,n o need to do union
		{
			return;
		}

		if (rank[ulp_u] < rank[ulp_v]) 
		{
			parent[ulp_u] = parent[ulp_v];
		}
		else if (rank[ulp_u] > rank[ulp_v])
		{
			parent[ulp_v] = parent[ulp_u];
		}
		else
		{
			parent[ulp_v] = parent[ulp_u];
			rank[ulp_v]++; // if rank equal then increase rank.
		}
	}

EXAMPLE

Now here is the complete code with example. Here we have added the complete class with the member methods and member variables. We have created the object of the class.

For Object creation Syntax :

classname<typename> objectname(number of elements in the set)

In the example below we have created an object named ds which has 7 elements.
Now to access the member methods syntax is :

objectname.methodname();

For example

Disjoint_set<int> ds1(9); // it would create a disjoint set data structure having nine elements

//Now we have to join elements 1,2 into a single set

ds1.Union_rank(1,2);

//if we have to find the parent of 1

int par = ds1.findParent(1);

//Therefore this way we can the classes and objects

Now the complete code below :

#include<bits/stdc++.h>
using namespace std;

template <typename T>
class Disjoint_set // this class can be reused everywhere
{
 vector<T> rank, size, parent; // declaring both vectors.
public:

 Disjoint_set(T n)
 {
 	rank.resize(n + 1, 0); // resizing them to store n+1 elements
 	size.resize(n + 1, 1);
 	for (int i = 0; i <= n; i++)
 	{
 		parent.push_back(i); // initially parent of every element is element himself.
 	}
 }

 T findParent(T n)
 {
 	if (parent[n] == n)
 	{
 		return n;
 	}

 	return parent[n] = findParent(parent[n]); //using path compression
 }

 void Union_rank(T n1, T n2)
 {
 	T ulp_u = findParent(n1); //finding the ultimate parent
 	T ulp_v = findParent(n2);

 	if (ulp_u == ulp_v) // if parent same then belong to same set so return,n o need to do union
 	{
 		return;
 	}

 	if (rank[ulp_u] < rank[ulp_v]) 
 	{
 		parent[ulp_u] = parent[ulp_v];
 	}
 	else if (rank[ulp_u] > rank[ulp_v])
 	{
 		parent[ulp_v] = parent[ulp_u];
 	}
 	else
 	{
 		parent[ulp_v] = parent[ulp_u];
 		rank[ulp_v]++; // if rank equal then increase rank.
 	}
 }
};

int main()
{
 Disjoint_set<int> ds(7);
 ds.Union_rank(1, 2);
 ds.Union_rank(2, 3);
 ds.Union_rank(4, 5);
 ds.Union_rank(6, 7);
 ds.Union_rank(5, 6);

 // are 3 and 7 in same component?

 if (ds.findParent(3) == ds.findParent(7)) //IF ULTIMATE PARENT IS SAME
 {
 	cout << "SAME" << "\n";
 }
 else
 {
 	cout << "DIFF" << "\n";
 }

 ds.Union_rank(3, 7); // now we put 3 and 7 into the same set and then check 

 if (ds.findParent(3) == ds.findParent(7))
 {
 	cout << "SAME" << "\n";
 }
 else
 {
 	cout << "DIFF" << "\n";
 }
}

OUTPUT

DIFF
SAME

CONCLUSION

The time complexity of creating n single item sets is O(n). The time complexity of find() and Union() operations is O(log n). Hence, the overall time complexity of the Disjoint Set Data Structure is O(n + log n).

The space complexity is O(n) because we need to store n elements in the Disjoint Set Data Structure.