Open-Source Internship opportunity by OpenGenus for programmers. Apply now.
In this article, we have solved the problem of Longest Consecutive Subsequence using the concept of Hash Map and Union Find.
Pre-requisites:
Problem Statement
You are provided with an unsorted array and your task is to find the length of the longest consecutive subsequence of elements from the array. The program should run in O(n)
time, where n
is the length of the array.
So lets understand the problem through an example:
Input:
n=7
arr[n]=[1,43,3,2,55,4,6]
Output: 4
Explanation:
From the array we can observe that the longest consecutive elements subsequence possible is [1,2,3,4]
, hence the output is 4
.
Solution
Now lets look at the solution, here I have discussed two approaches.
Approach 1
Here we will discuss the classic naive solution.
- First step will be to sort the array
After sorting the array becomes: [1,2,3,4,6,43,55]
- Now the second step will be to iterate over the array and if
arr[i]-1 == arr[i-1]
then increment acount
variable and then store the maximum value insidelength
variable.
The following code implements the same:
#include<iostream>
#include<vector>
#include<algorithm>
using namespace std;
int main(){
int n;
vector<int>v;
//Taking input into the array
cin>>n;
while(n>0){
int i;
cin>>i;
v.push_back(i);
n--;
}
//Sorting the array
sort(v.begin(),v.end());
//Now finding the answer by iterating over the array
int length=1, cnt=1;
for(int i=1;i<v.size();i++){
if(v[i]-1==v[i-1]){
cnt++;
}
else{
length=max(length, cnt);
cnt=1;
}
}
//Considering buffer values
length=max(length, cnt);
cout<<length<<endl;
return 0;
}
The problem with this approach is that the time complexity would be nlog(n)
as sorting in present. Now lets look at the optimum approach.
Approach 2
Here in this approach we will use a hash map to store the numbers in it and then we will find the length by iterating over it.
- The first step will make, an
unordered_map<int, bool>mp
to store the values present in the array, by traversing through it. - In the next step we will iterate over the array and for each element we will do the following
- we will search for
arr[i]-1
in the map, if its present we will move on to the next element. - if its not present, that number could be the starting point of the longest subsequence, hence it will be considered as starting element and then we will look for the consecutive elements and then increment the
cnt
variable accordingly.
- we will search for
- Then we will store the maximum length inside the
length
variable and print it when the loop ends.
The following code demonstrates the above process:
#include<iostream>
#include<vector>
#include <unordered_map>
using namespace std;
int main(){
int n;
vector<int>v;
//Taking input into the array
cin>>n;
while(n>0){
int i;
cin>>i;
v.push_back(i);
n--;
}
//Storing values into the map
unordered_map<int,bool>mp;
for(int i=0;i<v.size();i++){
mp.insert({v[i],true});
}
//Finding the length
int length=1;
for(int i=0;i<v.size();i++){
if(mp.find(v[i]-1)==mp.end()){
int start=v[i];
int cnt=1;
while(mp.find(start+1)!=mp.end()){
cnt++;
start++;
}
length=max(cnt,length);
}
}
cout<<length<<endl;
return 0;
}
So the time complexity of the above code is O(n) + O(n) + O(n) = O(3n) => O(n)
It will take O(n)
time to insert the element to the map, another O(n)
to iterate over the array the second time, the third O(n)
came from the while loop which we are using to find the max length once we got the starting element. The space complexity for the above code will be O(n)
as we are using a map of size n
.
Now you will be wondering why its time is O(n)
, the major reason is we are not checking for the length using all the elements, we are only finding length for the candidate elements, whose previous number is not present in the map. This small technique will improve the time complexity of the algorithm to a large extend.
Approach 3
Here in this approach we will be creating parent and child nodes for each element in the vector and in the end will return the maximum length of connected component, this approach is termed as union find approach. This approach will give a new way of looking at the problem, so lets get into it.
Here inside int main()
we initializes a vector and accepts values into the array, then we call the longestConsecutive()
function and passes the vector v
to it.
Now inside the longestConsecutive
function, first we return 0
if the given array is empty. Then we create an unordered_map
named UF
which will keep track of the parent nodes of each element, here we keep the key as the parent node of that particular element. Initially we will set the keys ( or parent nodes) of each elements to it self.
After this we will create a lambda function find()
which finds the parent node of a particular element, through a technique called path compression. If a particular element is not the key of itself then it recursively finds the parent node and then sets it to that element.
This function is being called inside another lambda function Union()
which accepts two elements and then sets the parent of those two elements to a common one. Here we use the find
function to find the root of a particular element. If its not making sense, don't worry, it will soon make sense.
Now we will create an unordered_set
and then insert all the values inside the vector into the set, to remove the duplicate elements. Then we will start iterating through the set, and for each element n
we will check whether n+1
exists in the set, if it exists that means both of these elements would have one common parent, so we will call the Union()
function and pass n
and n+1
into the function. Here n+1
would be set as the key of n
inside UF
. After the for loop the UF
will contain each elements with their key which would be its parent element. Now we will create another unordered_map
named c
, now we will iterate through the set and will do c[find(n)]++
, here for each element find(n)
will find the root of that element and increment it inside the map. Now we will iterate over the map c
to get the largest subsequence available.
#include <unordered_map>
#include <unordered_set>
#include <vector>
#include <iostream>
#include <functional>
using namespace std;
int longestConsecutive(vector<int> &nums)
{
if (nums.empty())
{
return 0;
}
unordered_map<int, int> UF;
for (int n : nums)
{
UF[n] = n;
}
function<int(int)> find = [&](int x)
{
if (x != UF[x])
{
UF[x] = find(UF[x]);
}
return UF[x];
};
auto union_ = [&](int x, int y)
{
UF[find(x)] = find(y);
};
unordered_set<int> S(nums.begin(), nums.end());
for (int n : S)
{
if (S.count(n + 1))
{
union_(n, n + 1);
}
}
unordered_map<int, int> c;
for (int n : S)
{
c[find(n)]++;
}
int longest_seq = 0;
for (const auto &p : c)
{
longest_seq = max(longest_seq, p.second);
}
return longest_seq;
}
int main()
{
int n;
cin >> n;
vector<int> v;
for (int i = 0; i < n; i++)
{
int no;
cin >> no;
v.push_back(no);
}
int ans = longestConsecutive(v);
cout << ans << endl;
return 0;
}
This might take a while to understand so lets take an example to understand it better.
Example:
n = 8
Vector = 44, 1, 4, 3, 55, 2, 1, 100
Initially, UF: {44:44, 1:1, 4:4, 3:3, 55:55, 2:2, 100:100}
After the for loops, UF: {44:44, 1:4, 4:4, 3:4, 55:55, 2:4, 100:100}
Inside the c map: {44:1, 55:1, 100:1, 4:4}
The maximumm of the keys (1, 1, 1, 4) would be returned (4) as the longest subsequence.
By the end of this, you would have complete understanding of how this problem is solved.