Optimizations in Union Find Data Structure

Open-Source Internship opportunity by OpenGenus for programmers. Apply now.

A union-find data structure is also called disjoined set data structure as it a collection of disjoined subsets.

disjoined

This table summarizes the optimizations in Union Find data structure (explained in detail further into this OpenGenus article):

Time complexity	Case
O(n)	No updation of parent pointer in Find + No control of tree heights in Union
O(m lg(n))	No updation of parent pointer in Find + use of rank in Union
O(log*(n))	Union by Rank/Size
O(α(n))	Path compression + union using size or rank

It is the grouping of elements in non-empty subsets such that every element occurs in only one subset. Thus the in union find data structure no element should be included more than once.

Operations on Union/Find Data Structures:

The basic operations which can be performed include:

Addition: Adding of new sets.
Union: This refers to the merging of two sets.
Find: As the name suggests this allows finding of an element to its respective set.

Data Structure Involved: Disjoined-Set Forest

Now that we have a basic idea about what union find stands for let's take a look at the data structure involved.

For using the the disjoined-set data structure, most often the disjoined-set forest is used.

Forest: The term forest comes under the graph theory according to which forest is an undirected path in which any two vertices are connected by atmost one path, which is different from a tree in which any two vertices are connected by exactly one path.

Root node: It's the topmost or highest node of the structure.

Parent Node: Any tree which has child/children are called so.

graphtheory

Thus, this forest data structure performs union and find actions.

Working of disjoined-set:

Let's try out this data structure with the help of an example:

Let there be six sets S₁, S₂, S₃, S₄, S₅, S₆.
Now let {S₁=10}, {S₂=20,30}, {S₃=40}, {S₄=50}, {S₅=60}, {S₆=70}.

Union of two set:
Union(S₂, S₄) = S₂={20,30,50}

Leads to merging of S₂ and S₄ into one disjoined set S₂.

Find:
If Find(i) is performed it returns the corresponding root node value, like Find(30)=20.

Below is the C++ code snippet for union and find:

#include<iosteam>
using namespace std;

class disjoinedset{
int root(int parent[], int);

int find(int r){
if (parent[r]==r)
return r;
return find(parent[r])    // recursion till root is found
}

void union( int x, int y){
        int o = find(x);
        int p = find(y);
 
        parent[o] = p;
    }

}

This method does not provide good performance and results in the formation of unbalanced trees.

MakeSet() operation:

This operation enables addition of a new element in a new set/subset into the data structure. For disjoined set forest a MakeSet initializes node's size and pointer.

Code function for MakeSet():

function MakeSet(x)
        x.parent = x
        x.size = 1     // for node's size
        x.rank = 0     // for node's rank

A more balanced approach for a more balanced tree:

Weighed Union: This technique is used to simplify the tree by reducing it. This done by joining the tree. The root of subtree with lesser number of nodes points towards the root of subtree with larger amount of nodes which leads to joining as well as reduction of tree's height.

Path Comparison:

The main concept behind path comparison is that on the way to root note the itiration takes place in a recursive manner by making every node visited point towards the root node. Here it is the Find() which works recursively. This makes a less complicated flatened structure.

The code function for the same is:

function Find(x) is
    if x.parent != x then
        x.parent = Find(x.parent)
        return x.parent
    else
        return x
        end if end function

Union by size and Union by rank:
For the formation of a more simplified tree these methods are used. The height of the tree is controlled by storing node information(node's size or node's rank respectively) beside parent pointer.

Code function for Union by size:

function Union(x, y) is
    x = Find(x)             
    y = Find(y)
    if x = y then            // for equal x and y
    end if
    
    else 
    if x.size < y.size then   // for unequal x and y
        (x, y) = (y, x)
    end if

    
    y.parent = x               //making x parent
   
    x.size = x.size + y.size    // updation of size
end function

In union of rank, ranks of the nodes x and y are compared. If ranks are unequal node which is larger becomes the parent and if they are equal either one can become the parent.

Function code for union by ranks:

function Union(x, y) is      
    x = Find(x)               //finding nodes of x and y
    y = Find(y)

    if x = y then               //if nodes are same
    end if

    if x.rank < y.rank then      //if x<y 
        (x, y) = (y, x)
    end if

    y.parent = x               //making x as the parent
    if x.rank = y.rank then     // increment accordingly
        x.rank = x.rank + 1
    end if
end function

Inverse Ackermann Function:

Inverse Ackermann function is a part of the recursion theory which also comes to sight in the time complexity of disjoined set.

According to mathematical definition:
α(m,n) = min{i≥ 1: A(i, ⌊ m/n⌋) > log2 n}
This is a two-parameter variation in which A(i,j) is Ackermann Function.
The value of this function grows extremely slowly.

Amortized Running Time:

This refers to the calculation of the algorithmic complexity in terms of time or memory used per operation. It's used when mostly the operation is fast but on some occasions the operation of the algorithm is slow.

Time Complexity of Union Find data structures in different cases

Time complexity	Case
O(n)	No updation of parent pointer in Find + No control of tree heights in Union
O(m lg(n))	No updation of parent pointer in Find + use of rank in Union
O(log*(n))	Union by Rank/Size
O(α(n))	Path comparison + union using size or rank

Note: α(n) is inverse Ackermann funtion; log*() is iterated logarithm

Efficiency of an algorithm

An efficient algorithm is the one which uses computer resources such as time and space in a minimum fashion or in ways which are considered reasonable. Such algorithm also make more improved use of data structures involved making it more optimal and favorable.

The Disjoined Set Union or Union Find has high efficiency in terms of time and space used and also the nature of data structure involved.

Union/Find application:

Krushkal's algorithm: This is used for finding MST (minimum spanning tree) of an edge of connected undirected graph.
Reducing the complexity of an algorithm.
Cycle of a graph.