Check if a string is a subsequence of another string

Binary Tree Problems books

Get FREE domain for 1st year and build your brand new site

In this problem, we will see how we can check whether a string is a subsequence of another string. Now we will consider an example to understand the given problem.

Let's say we have a string 'Welcome to OpenGenus' and a substring OGenus.
Now we need to find whether we have a substring or not.

mainString = S1 = "Welcome to OpenGenus";

Subsequences (S2) are:

OpenGenus
OGenus
Wlcomepen [Notice we have deleted few characters in mainString]

Hence, S2 is a subsequence of S1 if we can convert S1 to S2 by deleting a few characters of S1. Subsequence is a generalization of substring where substring is a continuous set of characters while in subsequence, characters need not to continuous.

We will use these approaches to solve this problem:

  1. Naive approach O(N * 2^N) time
  2. Optimised approach O(N) time

Naive Method

We will make all possible subsequences of the string we have (mainString) in the length of the string we are given to find (stringToFind).

The steps are:

  • Create all subsequences of string S1
  • For each subsequence P, check if P == S2
  • If P == S2, then S2 is a subsequence of S1.
  • If no subsequence matches, S2 is not a subsequence of S1.

PseudoCode:

How to generate all subsequences of a string S1?

  • The key idea is to use a number N of B bits where B is the lenght of S1.
  • If the i-th bit of N is 1, then i-th character of S1 is included in the subsequence.
  • If the i-th bit of N is 0, then i-th character of S1 is not included in the subsequence.
  • We can get all possible values of N by traversing from 0 to 2^B-1.
Function genrateSubstr(mainString)
    array subseq[]
    int j, counter, lenReq=length(mainString)
    powSize = 2^lenReq // Number of subsequences
    for counter = 0 to counter = powSize :
		string x="";
		for j = 0 to j = lenReq:
            if (counter&(1<<j) !=0:
                x= x + mainString[j]
		subseq.add(x);
	return subseq;
    
Function isSubstring(subs,stringToFind)
    for(string x:subs):
	 if(x==stringToFind):
	    return "Match Found"
    return "Not Found"
    
subseq = genrateSubstr(mainString)
for each S in subseq:
    if isSubstring(S, stringToFind)
        then stringToFind is subsequence of mainString

Algorithm

We will create 2 functions First one will generate all subsequences possible.
The second one will match and try to find the right subsequence.

Algorithm for Subsequence Generation:

Step 1: First we will calculate the number of time it would require to run the process. Since we have N size string it will generate about 2 N strings.

Step 2: Now we will have two nested loops where the first one runs 2 N times and the second one just runs to the length of our string and implements logic.

Step 3 (a) The logic involved is bit manipulation. Consider as we have a string required to give subsequence say mars. Now we will require 2 4 which is 16 subsequences, as we can see in any subsequence of mars we will have either one value present or not. This means there are only 2 possibilities for a character to exist in a subsequence.

Hence we can safely represent them as in Binary form like 1111 which is "mars" and 0000 which will be "". Thus we can utilize this technique to generate 2 4 subsequence.

Step 3 (b) Now we have

    0000  - ""
    0001  - "m"
    0011  - "ma"
    0111  - "mar"
    . 
    .
    1001 - "ms"
    .
    1111 - "mars"

This is done by a simple formula counter & ( 1<<j ) ! = 0

Here this is a special case meaning we left Shift 0001, j times and then we take the AND of both counter and 1<<j to get the present characters till 0 comes in. When that happens we restart the loop.

Step 4: This step has repeated a total of 2N times over a length of mainString.

Step 5: Finally we traverse the array of these constructed subsequences and lastly through linear search verify the presence of the string.

Example:

Let's take these strings

string mainStr = "Welcome to OpenGenus";
string str = "OpenGenus";

So now we pass mainStr to the generator function. Now we observe the status of variable string x (consider the pseudo-code)
so initially:
Here as the loop iterates we try to append the value of x. We continiously iterate the mainString to select the characters for making a subsequence.
So here,

    x:  j: 0 counter: 0 (1<<j): 1 (counter&(1<<j)):0
    x:  j: 1 counter: 0 (1<<j): 2 (counter&(1<<j)):0

1<<j implies that we create 001 as 010 in binary or simply we multiply power of 2 j times. Simply 1* pow(2,j).

Now we observe the status at the mid of the internal loop

Here with further iteration x still hasn't taken a value as counter&(1<<j)=0
Now it happens as we try to seek a value where counter is location of main loop here 0 and (1<<j) is 16 = 10000.
Opearing & we get 00000 & 10000 = 000000.

    x:  j: 4 counter: 0 (1<<j): 16 (counter&(1<<j)):0
    x:  j: 5 counter: 0 (1<<j): 32 (counter&(1<<j)):0
    x:  j: 6 counter: 0 (1<<j): 64 (counter&(1<<j)):0

At the end of the first iteration we have value:
As we can see it is not suitable for operation (counter&(1<<j)) to give non-zero we seek the end of loop by pure traversal throughout the string.

    x:  j: 18 counter: 0 (1<<j): 262144 (counter&(1<<j)):0
    x:  j: 19 counter: 0 (1<<j): 524288 (counter&(1<<j)):0

which is a good result as "" is also a subsequence.

Now the loop restarts and set's the value as

    x:  j: 0 counter: 1 (1<<j): 1 (counter&(1<<j)):1

Here we add it to the array of strings finally.

For better understanding, here's how our algorithm takes in the non-empty strings.

    x: W j: 1 counter: 9 (1<<j): 2 (counter&(1<<j)):0
    x: W j: 2 counter: 9 (1<<j): 4 (counter&(1<<j)):0

Now for the iteration in the mid.As we can infer from here it has (counter&(1<<j)) becoming non-zero at multiple places. It happens as we set counter as 9 which is 1001 and 1<<j becomes anything other then 0XX0.( X= 1 or 0 )

so when we see this we can append that x and contune throughout the string to see if we can get more subsequence.

    x: Wc j: 11 counter: 9 (1<<j): 2048 (counter&(1<<j)):0
    x: Wc j: 12 counter: 9 (1<<j): 4096 (counter&(1<<j)):0

Lastly, before we add the string to our array

    x: Wc j: 18 counter: 9 (1<<j): 262144 (counter&(1<<j)):0
    x: Wc j: 19 counter: 9 (1<<j): 524288 (counter&(1<<j)):0
    x:  j: 0 counter: 10 (1<<j): 1 (counter&(1<<j)):0

Thus in this way, we add subsequences to our array and do the linear search over the array.

For detail over Linear Seach read this article

Code in C++

#include <iostream>
#include <bits/stdc++.h>
using namespace std;

#define pb push_back

vector<string> subseqGen(string main){
	int lenReq=main.size();
	int powSize = pow(2,lenReq);

	vector<string> subseq;
	for (int counter = 0; counter < powSize; counter++) {
		string x="";
		for (int j = 0; j < lenReq; j++) {
			if((counter&(1<<j))!=0)x+=main[j];
		}
		subseq.push_back(x);
	}
	return subseq;
}

string matchStr(vector<string> subs,string req){
	for(string x:subs){
		if(x==req){
			return "Match Found";
		}
	}
	return "Not Found";
}

int main(){
    string main,sub;
    
    getline(cin,main);
    getline(cin,sub);
    
    cout<<main<<endl;
    cout<<sub<<endl;
    
    vector<string> subseq = subseqGen(main);
    string result=matchStr(subseq,sub);
    cout<<result<<"\n";

    return 0;
}

Output

Match Found

Time Complexity

Here we use O(N) to compare the generated strings with our stringToFind
and use O( 2N ) to create an array of different substrings.

So, Time complexity is O(N * 2 N)

Optimised Algorithm

Here we iterate each character of the mainString to match it with our stringToFind and finally print the result.

The steps are as follows:

  • Specify the main string (S1) and sub string (S2)
  • Initialize iterator for S1 (say i) and S2 (say j) at the first character
  • If S1[i] == S[j], then increment both i and j
  • If S1[i] != S[j], then increment i only
  • If i has reached the end of S1 but j has NOT reached the end of S2, then S2 is not a subsequence of S1.
  • Else S2 is a subsequence of S1.

The time complexity is O(N).
The space complexity is O(1).

PseudoCode:

Function findSubSeq(mainString,subString):
 while i<length(mainString) && j<length(subString):
     i++
     if mainString[i]==subString[j]:
         j++
 if j==length(subString):
     return "Match Found"
 return "Not found"

Working

Here rather then genrating we utilize the already available string by simply counting the appearance of each character and if it matches we have our result.

Steps:

First: We take in two variable i and j. One for iterating mainString and next for counting the character matched.
Second: We iterate till we get i equal to mainString size and j is also about same size as subString.
Third: For the steps involved we increment our variable i for finding a match of string as we compre it with subString at jth position. If found we increment j else we continue.
Lastly: We compare the length of subString with the value of j.Lastly we report the result.

Code in working:

For this consider the pseudocode first.

Now we have first iteration with i all the way to size of mainString. Now we have j which is initialized as 0. Now we take the first iteration result.

i: 0 j: 0 main[i]=W sub[j]=O

Here we have a clear understanding of how every variable is responding to the first call. Now we study how it affects the next iterations.

Now the variables j remains same as increment condition not satisfied and consequnetly mainString[i] and subString[j] changes

i: 1 j: 0 main[i]=e sub[j]=O
i: 2 j: 0 main[i]=l sub[j]=O

i: 10 j: 0 main[i]=  sub[j]=O
i: 11 j: 0 main[i]=O sub[j]=O

This happens till we encounter 12th element or 11th index which matches out subString[0] so now condition is satisfied and j is incremented.

i: 12 j: 1 main[i]=p sub[j]=p
i: 13 j: 2 main[i]=e sub[j]=e
i: 14 j: 3 main[i]=n sub[j]=n
i: 15 j: 4 main[i]=G sub[j]=G
i: 16 j: 5 main[i]=e sub[j]=e
i: 17 j: 6 main[i]=n sub[j]=n
i: 18 j: 7 main[i]=u sub[j]=u
i: 19 j: 8 main[i]=s sub[j]=s

Similar changes occours as we find matches for subsequence and thus we accomplish the task in mere 19 iteration which is exact length of our string.

Code in C++

#include <iostream>
#include <bits/stdc++.h>
using namespace std;


string findSubSeq(string main,string sub){
 int i=0,count=0;
 int msize=main.size();
 int ssize=sub.size();
 
 while(i<msize && count<ssize){
  if (main[i]==sub[count])count++;
  i++;
  }
 
 if(ssize==count)return "Match Found";
 
 return "Not Found";
}

int main(){
 
 string main,sub;
 getline(cin,main);
 getline(cin,sub);

 string result = findSubSeq(main,sub);
 cout<<result<<"\n";
 return 0;
}

Output

Match Found

Time Complexity

In this solution, we go through the string only once, hence we have O(N) as Time complexity.

The space complexity is O(1).

With this article at OpenGenus, we have the complete knowledge of how to check if a string is a subsequence of another string efficiently in linear time O(N).