Open-Source Internship opportunity by OpenGenus for programmers. Apply now.
Reading time: 30 minutes | Coding time: 5 minutes
We have to find the longest repeating and non-overlapping substring in a given string. A substring is a contiguous sequence of characters within a string.We have to return the substring of maximum length which occurs more than once in the string without any overlap.We can return any such substring if more than one such substring exists in the string. We have provided an optimal O(n2) solution to this problem using Dynamic Programming.
The two approaches are:
- Brute force O(N^3) time
- Dynamic Programming O(N^2) time
Examples
Input:- largelargerlargest
Output:- large
Input:- banana
Output:- an or na
Input:- opengenus
Output:- en
Naive Approach
We can solve the problem in O(n3) time-complexity by finding all substrings in O(n2) and checking it with the substrings of the remaining string in O(n).
Following is the cpp implementaion of the naive approach:-
#include<bits/stdc++.h>
using namespace std;
int main()
{
string s,ans;
cout << "Enter the input string:-\n";
cin >> s;
int maxm=0;
for(int i=0;i<s.length();i++)
{
for(int j=i;j<s.length();j++)
{
string x=s.substr(i,j-i+1); //finding substrings of the string
for(int k=j+1;k<s.length();k++)
{
string y=s.substr(k,j-i+1);
if(y==x) //check for same non-overlapping substring in rest of string
{
if(y.length()>maxm)
{
maxm=y.length();
ans=y;
}
}
}
}
}
cout << "The longest repeating non-overlapping substring is:- " << ans << '\n';
return 0;
}
Dynamic Programming Approach
- Idea is to look for every same character and save its index.
- Check whether difference between index is less than longest repeating and non-overlapping substring size.
Here, dp[i][j] stores length of the matching and non-overlapping substrings ending with i'th and j'th characters.
If the characters at (i-1)th and (j-1)th position matches dp[i-1][j-1] is less than the length of the considered substring (j-1) then maximum value of dp[i][j] and the maximum index i till this point is updated.The length of the longest repeating and non-overlapping substring can be found by the maximum value of dp[i][j] and the substring itself can be found using the length and the ending index which is the finishing index of the suffix.
For Example:-
Consider the word banana :
The only matching characters are a(2nd,4th and 6th) and n(3rd and 5th).Hence for all other indices i, dp[i][j]=0.
Initially finishing index ind=0 and maximum length of substring len=0.
When i=2 and j=4,
Since, dp[i-1][j-1]=0 which is less than (j-i) i.e. 2,
dp[i][j]=dp[i-1][j-1]+1 ,
i.e.,dp[2][4]=1.
As dp[2][4]>len
len=dp[2][4]=1
ind=max(ind,i)=2
When i=2 and j=6,
Since dp[i-1][j-1]=dp[1][5]=0 which is less than (j-i) i.e. 4,
dp[i][j]=dp[i-1][j-1]+1 ,
i.e.,dp[2][6]=1.
As dp[2][6]=len,
len and ind wouldn't be updated.
When i=4 and j=6,
Since dp[i-1][j-1]=dp[3][5]=0 which is less than (j-i) i.e. 2,
dp[i][j]=dp[i-1][j-1]+1 ,
i.e.,dp[4][6]=1.
As dp[4][6]=len,
len and ind wouldn't be updated.
When i=3 and j=5,
Since dp[i-1][j-1]=dp[2][4]=1 which is less than (j-i) i.e. 2,
dp[i][j]=dp[i-1][j-1]+1 ,
i.e.,dp[3][5]=2.
As dp[3][5]>len
len=dp[3][5]=2
ind=max(ind,i)=3
The final matrix dp[][] would be:-
j=0 | 1 | 2 | 3 | 4 | 5 | 6 | |
---|---|---|---|---|---|---|---|
i=0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 0 | 0 | 0 | 0 | 1 | 0 | 1 |
3 | 0 | 0 | 0 | 0 | 0 | 2 | 0 |
4 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
To retrieve the longest repeating and non-overlapping substring,we store the characters present from position (ind-len+1) till position (ind),i.e., the substring from position 2 till position 3 giving the substring an as the result.
Pseudo Code:
if(str[i-1] == str[j-1] && dp[i-1][j-1] < (j - i))
{
dp[i][j] = dp[i-1][j-1] + 1;
if (dp[i][j] > len)
{
len = dp[i][j];
ind = max(i, ind);
}
}
else
dp[i][j] = 0;
// where i iterates from 1 to n and j iterates from
// i+1 to n where n is the string size
if(len > 0)
{
for (int i = ind - len + 1; i <= ind; i++)
ans.push_back(str[i-1]);
}
return ans
Implementations
We have provided 3 implementations (C++,Java,Python).
Following is the implementation in C++:-
// C++ program to find the longest repeated
// non-overlapping substring
#include<bits/stdc++.h>
using namespace std;
// Returns the longest repeating non-overlapping
// substring in str
string longestRepeatedSubstring(string str)
{
int n = str.length();
int dp[n+1][n+1];
memset(dp, 0, sizeof(dp)); // Initializing dp to 0
string ans; // To store resultant substring
int len = 0; // To store length of result
// building table in bottom-up manner
int ind = 0;
for (int i=1; i<=n; i++)
{
for (int j=i+1; j<=n; j++)
{
// if the characters at (i-1)th and (j-1)th
// position matches and codition for
// removing overlapping is satisfied
if(str[i-1] == str[j-1] && dp[i-1][j-1] < (j - i))
{
dp[i][j] = dp[i-1][j-1] + 1;
// updating maximum length of the
// substring and updating the finishing
// index of the suffix
if (dp[i][j] > len)
{
len = dp[i][j];
ind = max(i, ind);
}
}
else
dp[i][j] = 0;
}
}
// If we have non-empty result, then insert all
// characters from first character to last
// character of string
if (len > 0)
{
for (int i = ind - len + 1; i <= ind; i++)
ans.push_back(str[i-1]);
}
return ans;
}
// Driver program to test the above function
int main()
{
string s;
cout << "Enter the input string:- ";
cin >> s;
cout << "The longest repeating non-overlapping substring is:- " << longestRepeatedSubstring(s) << '\n';
return 0;
}
/*
Sample Input 1:-
Enter the input string:- opengenus
Sample Output 1:-
The longest repeating non-overlapping substring is:- en
Sample Input 2:-
Enter the input string:- aaaaaa
Sample Output 2:-
The longest repeating non-overlapping substring is:- aaa
*/
Following is the implementation in Java:-
import java.util.*;
class nonRepeativeSubstring
{
public static String nonRepeativeSubstr(String str,int n)
{
int dp[][]=new int[n+1][n+1];
int max=0,index=0;
for(int i=1;i<=n;++i)
{
for(int j=i+1;j<=n;++j)
{
if(str.charAt(i-1)==str.charAt(j-1) && j-i>dp[i-1][j-1])
{dp[i][j]=dp[i-1][j-1]+1;
if(max<dp[i][j])
{
max=dp[i][j];
//save last index of substring
index=Math.max(i,index);
}
}
else
dp[i][j]=0;
}
}
return max>0?str.substring(index-max,index):"-1";
}
public static void main (String[] args)
{
Scanner scan=new Scanner(System.in);
System.out.println("Enter the input string:-");
String s=scan.nextLine();
System.out.println("The longest repeating non-overlapping substring is:- "+nonRepeativeSubstr(s,s.length());
}
}
}
/*
Sample Input 1:-
Enter the input string:- opengenus
Sample Output 1:-
The longest repeating non-overlapping substring is:- en
Sample Input 2:-
Enter the input string:- aaaaaa
Sample Output 2:-
The longest repeating non-overlapping substring is:- aaa
*/
Following is the implementation in Python:
# Returns the longest repeating non-overlapping
# substring in str
def longestRepeatedSubstring(str):
n = len(str)
dp = [[0 for x in range(n + 1)] # Initializing dp to 0
for y in range(n + 1)]
ans = "" # To store resultant substring
ans_len = 0 # To store length of result
# building table in bottom-up manner
ind = 0
for i in range(1, n + 1):
for j in range(i + 1, n + 1):
# if the characters at (i-1)th and (j-1)th
# position matches and codition for
# removing overlapping is satisfied
if (str[i - 1] == str[j - 1] and dp[i - 1][j - 1] < (j - i)):
dp[i][j] = dp[i - 1][j - 1] + 1
# updating maximum length of the
# substring and updating the finishing
# ind of the suffix
if (dp[i][j] > ans_len):
ans_len = dp[i][j]
ind = max(i, ind)
else:
dp[i][j] = 0
# If we have non-empty result, then insert
# all characters from first character to
# last character of string
if (ans_len > 0):
for i in range(ind - ans_len + 1,ind + 1):
ans = ans + str[i - 1]
return ans
# Driver Code
if __name__ == "__main__":
str=input("Enter the input string: ")
print("The longest repeating non-overlapping substring is:- ",longestRepeatedSubstring(str))
#Sample Input 1:-
# Enter the input string:- opengenus
#Sample Output 1:-
# The longest repeating non-overlapping substring is:- en
#Sample Input 2:-
# Enter the input string:- aaaaaa
#Sample Output 2:-
# The longest repeating non-overlapping substring is:- aaa
Complexity
- Worst case time complexity:
O(n2)
- Average case time complexity:
O(n2)
- Best case time complexity:
O(n2)
- Space complexity:
O(n2)
Building a 2-D table requires two for loops hence the time-complexity of this algorithm will be O(n2).Also making a 2-D table would require n2 space hence the space complexity of this algorithm will also be O(n2).
With this, you have the complete knowledge of this problem. Enjoy.