# Find if a string is a sub-string of another string

Reading time: 35 minutes | Coding time: 15 minutes

A substring is a contiguous sequence of characters within a string. For instance, **open** is a substring of **opengenus**. We have presented two approaches to check whether a string is a substring of another string namely **naive method** which takes O(N^2) time and efficient method using the concept of Rolling Hash which takes linear time O(N).

### Example

**Input**:-

```
str1 = opengenus
str2 = open
```

**Output**:-

Yes.

"open" exists as a substring in opengenus starting from index 0 and ending at index 3.

**Input**:-

```
str1 = opengenus
str2 = opera
```

**Output**:-

No.

No continuous sequence of characters exists in "opengenus" which is equal to "opera".

## Approaches

We have provided two approaches:-

- Naive Method
**O(N^2) time and O(1) space** - Efficient method using rolling hash
**O(N) time and O(N) space**

## Naive Method

We can simply compare both strings character by character and stop when we find the entire second string in the first or when we reach the end of the first string. This would in worst case require to traverse the entire first string giving the time complexity of **O(length(str1))** or **O(n)** for ease and comparing all substrings results in O(N^2) time complexity.

For example,suppose we want to find whether string **iit** is present as a substring in the string **iiitian**.We will start taking substring of length equal to 3,i.e., length of the string **iit**, from the string **iiitian** and check whether it is equal to the required substring **iit** or not.If they are equal,return true else check next substring.If none of them matches,return false.Here firstly, **iii** will be checked.Since **iii** is not equal to **iit** , therefore next 3-letters substring starting from index 1(0 based),i.e, **iit** will be considered.Since it is equal to the required substring therefore true is returned.

Pseudocode:

```
is_substring(string str1, string str2)
{
for(int i=0; i<str1.length()-str2.length()+1; i++)
{
// substring of str1 of length of str2
string x=str1.substr(i,str2.length());
// takes O(N) time
if(x == str2)
return true;
}
// if str2 isn't a substring of str1
return false;
}
```

String comparison take linear time O(N) as each character needs to be compared one by one. To learn more about this idea, go through this article: String Hashing by OpenGenus.

Following is the C++ implementation of the naive approach:-

```
#include<bits/stdc++.h>
using namespace std;
bool isSubstring(string str1,string str2)
{
for(int i=0;i<str1.length()-str2.length()+1;i++)
{
// substring of str1 of length of str2
string x=str1.substr(i,str2.length());
if(x==str2)
return true;
}
// if str2 isn't a substring of str1
return false;
}
int main()
{
string str1,str2;
cout << "str1=";
cin >> str1;
cout << "str2=";
cin >> str2;
if(isSubstring(str1,str2))
cout << "Yes\n";
else
cout << "No\n";
return 0;
}
/*
Sample Input 1:-
str1= iiitian
str2= iit
Sample Output 1:-
Yes
Sample Input 2:-
str1= opengenus
str2= engine
Sample Output 2:-
No
*/
```

## Efficient Method (using Rolling Hash)

The idea is to compute the hash value of a string str1 which needs to be searched. Let this be H1. Then, we will generate hash value of subsequent sub-strings in str2 with the same length as str1 using the rolling hash method and compare it with H1.

We can efficiently find the hash values of all substrings of the first string str1 and store it in the array. Similarly find the hash value of the second string str2 and compare the hash values of the substrings of str1 with the hash value of the str2 using mapping to find whether the given string str2 exists as a substring in str1. The map **m2** is used to mark the integer 1 at the hash value corresponding to str2. The map **m1** maps the hash values corresponding to all the substrings of str1 to 1.So if the value of **m1** at the hash value corresponding to str2 is 1 that means a substring is present in str1 corresponding to the same hash value,i.e.,str2 is present as substring in str1.

We can also use **binary-search** instead of maps by storing the hash values corresponding to all the substrings of str1 in a sorted form in an array and then search for the hash value corresponding to str2 in the array using binary search in **O(log(n))** time complexity where n is the size of the array.You can read more about binary search here.

In general,the hash H can be defined as:-

**H=( c _{1}a^{k-1} + c_{2}a^{k-2} + c_{3}a^{k-3}. . . . + c_{k}a^{0} ) % m**

where a is a constant, c1,c2, ... ck are the input characters and m is a large prime number, since the probability of two random strings colliding is about â‰ˆ 1/m.

Then, the hash value of next substring,Hnxt using rolling hash can be defined as:-

**H**_{nxt}=( ( H - c_{1}a^{k-1}) * a + c_{k+1}a^{0}) % mLet's consider the same example, we want to find whether string **iit** is present as a substring in the string **iiitian**.Applying the above formula,the hash value of string **iit** would be **7955**.So **m2[7955]=1**.

When we find the hash values of all 3-letters substrings of **iiitian**, the hash value of the first substring **iii** would be **7944** and that of the next substring **iit** would be **7955**.Therefore **m1[7944]=1** and **m1[7955]=1**.We can map these hash values and when the hash value of the substring matches with the string,return true.In this case since **m2[7955]=m1[7955]=1**,i.e., the hash value of the substring **iit** matches , therefore true will be returned.

Note that you can implement this without the use of maps.

Pseudocode:

```
bool is_substring(string str1, string str2)
{
int j=0;
string prev = str1.substr(j,str2.length());
j++;
map<long long,int> m1, m2;
long long h2=compute_hash(str2);
// map 1 to the hash-value corresponding to str2
m2[h2]=1;
long long h1=compute_hash(prev);
// map 1 to the hash-values corresponding to all
// substrings of str1
m1[h1]=1;
for(int i=str2.length();i<str1.length();i++)
{
h1=rolling_hash(h1,prev,str1[i]);
m1[h1]=1;
// finds previously considered substring
prev=str1.substr(j,str2.length());
j++;
}
// if str2 isn't a substring of str1 returns 0 else 1
return m1[h2] == m2[h2];
}
```

Following is the complete C++ implementation:-

```
#include<bits/stdc++.h>
using namespace std;
// computes the hash value of the input string s
long long compute_hash(string s)
{
const int p = 31; // base
const int m = 1e9 + 9; // large prime number
long long hash_value = 0;
long long p_pow = (long long)pow(p,s.length()-1);
for (char c : s)
{
hash_value = (hash_value + (c - 'a') * p_pow) % m;
p_pow = (p_pow / p);
}
return hash_value;
}
// finds the hash value of next substring given nxt as the ending character
// and the previous substring prev
long long rolling_hash(long long H,string prev,char nxt)
{
const int p = 31;
const int m = 1e9 + 9;
long long Hnxt=( ( H - (prev[0]-'a')*(long long)pow(p,prev.length()-1) ) * p + (nxt-'a') ) % m;
return Hnxt;
}
bool isSubstring(string str1,string str2)
{
int j=0;
string prev=str1.substr(j,str2.length());
j++;
map<long long,int> m1,m2;
long long h2=compute_hash(str2);
// map 1 to the hash-value corresponding to str2
m2[h2]=1;
long long h1=compute_hash(prev);
// map 1 to the hash-values corresponding to all
// substrings of str1
m1[h1]=1;
for(int i=str2.length();i<str1.length();i++)
{
h1=rolling_hash(h1,prev,str1[i]);
m1[h1]=1;
// finds previously considered substring
prev=str1.substr(j,str2.length());
j++;
}
// if str2 isn't a substring of str1 returns 0 else 1
return m1[h2]==m2[h2];
}
int main()
{
string str1,str2;
cout << "str1=";
cin >> str1;
cout << "str2=";
cin >> str2;
if(isSubstring(str1,str2))
cout << "Yes\n";
else
cout << "No\n";
return 0;
}
/*
Sample Input 1:-
str1= iiitian
str2= iit
Sample Output 1:-
Yes
Sample Input 2:-
str1= opengenus
str2= engine
Sample Output 2:-
No
*/
```

## Complexity

- Worst case time complexity:
**O(n)** - Average case time complexity:
**O(log(n))** - Best case time complexity:
**O(1)** - Space complexity:
**O(n)**

We have still require to get all characters of string. So the worst case time complexity will remain **O(n)** but with the use of rolling hash,two strings can be compared in **O(1)** time complexity and all substrings can be compared in **O(log(n))** using maps or binary search. With the increase of the number of entries, the map's space will increase linearly. So space complexity is **O(n)**.

## References

- Rolling Hash by Ashutosh Singh
- String Hashing by OpenGenus