Longest repeating and non overlapping substring in a string


Reading time: 30 minutes | Coding time: 5 minutes

We have to find the longest repeating and non-overlapping substring in a given string. A substring is a contiguous sequence of characters within a string.We have to return the substring of maximum length which occurs more than once in the string without any overlap.We can return any such substring if more than one such substring exists in the string. We have provided an optimal O(n2) solution to this problem using Dynamic Programming.

The two approaches are:

  • Brute force O(N^3) time
  • Dynamic Programming O(N^2) time

Examples

Input:- largelargerlargest
Output:- large

Input:- banana
Output:- an or na

Input:- opengenus
Output:- en

Naive Approach

We can solve the problem in O(n3) time-complexity by finding all substrings in O(n2) and checking it with the substrings of the remaining string in O(n).

Following is the cpp implementaion of the naive approach:-

#include<bits/stdc++.h>
using namespace std;
int main()
{
	string s,ans;
	cout << "Enter the input string:-\n";
	cin >> s;
	int maxm=0;
	for(int i=0;i<s.length();i++)
	{
		for(int j=i;j<s.length();j++)
		{
			string x=s.substr(i,j-i+1);  //finding substrings of the string
			for(int k=j+1;k<s.length();k++)
			{
				string y=s.substr(k,j-i+1);
				if(y==x)	  //check for same non-overlapping substring in rest of string 
				{
					if(y.length()>maxm)
					{
						maxm=y.length();
						ans=y;
					}
				}
			}
		}
	}
	cout << "The longest repeating non-overlapping substring is:- " << ans << '\n';
	return 0;
}

Dynamic Programming Approach

  • Idea is to look for every same character and save its index.
  • Check whether difference between index is less than longest repeating and non-overlapping substring size.

Here, dp[i][j] stores length of the matching and non-overlapping substrings ending with i'th and j'th characters.

If the characters at (i-1)th and (j-1)th position matches dp[i-1][j-1] is less than the length of the considered substring (j-1) then maximum value of dp[i][j] and the maximum index i till this point is updated.The length of the longest repeating and non-overlapping substring can be found by the maximum value of dp[i][j] and the substring itself can be found using the length and the ending index which is the finishing index of the suffix.

For Example:-

Consider the word banana :
The only matching characters are a(2nd,4th and 6th) and n(3rd and 5th).Hence for all other indices i, dp[i][j]=0.

Initially finishing index ind=0 and maximum length of substring len=0.

When i=2 and j=4,
Since, dp[i-1][j-1]=0 which is less than (j-i) i.e. 2,
    dp[i][j]=dp[i-1][j-1]+1 ,
i.e.,dp[2][4]=1.

As dp[2][4]>len
    len=dp[2][4]=1
    ind=max(ind,i)=2
When i=2 and j=6,

Since dp[i-1][j-1]=dp[1][5]=0 which is less than (j-i) i.e. 4,
    dp[i][j]=dp[i-1][j-1]+1 ,
i.e.,dp[2][6]=1.

As dp[2][6]=len,
    len and ind wouldn't be updated.
When i=4 and j=6,

Since dp[i-1][j-1]=dp[3][5]=0 which is less than (j-i) i.e. 2,
    dp[i][j]=dp[i-1][j-1]+1 ,
i.e.,dp[4][6]=1.
As dp[4][6]=len,
    len and ind wouldn't be updated.

When i=3 and j=5,
Since dp[i-1][j-1]=dp[2][4]=1 which is less than (j-i) i.e. 2,
    dp[i][j]=dp[i-1][j-1]+1 ,
i.e.,dp[3][5]=2.
As dp[3][5]>len
    len=dp[3][5]=2
    ind=max(ind,i)=3

The final matrix dp[][] would be:-

j=0 1 2 3 4 5 6
i=0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0
2 0 0 0 0 1 0 1
3 0 0 0 0 0 2 0
4 0 0 0 0 0 0 1
5 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0

To retrieve the longest repeating and non-overlapping substring,we store the characters present from position (ind-len+1) till position (ind),i.e., the substring from position 2 till position 3 giving the substring an as the result.

Pseudo Code:

    if(str[i-1] == str[j-1] && dp[i-1][j-1] < (j - i)) 
        { 
            dp[i][j] = dp[i-1][j-1] + 1; 
            if (dp[i][j] > len) 
            { 
                len = dp[i][j]; 
                ind = max(i, ind); 
            } 
        } 
        else
            dp[i][j] = 0; 
   // where i iterates from 1 to n and j iterates from
   // i+1 to n where n is the string size
   if(len > 0)   
   {
   for (int i = ind - len + 1; i <= ind; i++) 
    ans.push_back(str[i-1]); 
    }
 return ans

Implementations

We have provided 3 implementations (C++,Java,Python).

Following is the implementation in C++:-

// C++ program to find the longest repeated 
// non-overlapping substring 
#include<bits/stdc++.h> 
using namespace std; 
  
// Returns the longest repeating non-overlapping 
// substring in str 
string longestRepeatedSubstring(string str) 
{ 
    int n = str.length(); 
    int dp[n+1][n+1]; 

    memset(dp, 0, sizeof(dp));  // Initializing dp to 0
		
    string ans; 	// To store resultant substring 
    int len = 0; // To store length of result 
  
    // building table in bottom-up manner 
    int ind = 0; 
    for (int i=1; i<=n; i++) 
    { 
        for (int j=i+1; j<=n; j++) 
        { 
            
            // if the characters at (i-1)th and (j-1)th 
			// position matches and codition for
			// removing overlapping is satisfied
            if(str[i-1] == str[j-1] && dp[i-1][j-1] < (j - i)) 
            { 
                dp[i][j] = dp[i-1][j-1] + 1; 
  
                // updating maximum length of the 
                // substring and updating the finishing 
                // index of the suffix 
                if (dp[i][j] > len) 
                { 
                    len = dp[i][j]; 
                    ind = max(i, ind); 
                } 
            } 
            else
                dp[i][j] = 0; 
        } 
    } 
  
    // If we have non-empty result, then insert all 
    // characters from first character to last 
    // character of string 
    if (len > 0) 
        {
			for (int i = ind - len + 1; i <= ind; i++) 
            ans.push_back(str[i-1]); 
		}
  
    return ans; 
} 
  
// Driver program to test the above function 
int main() 
{ 
	string s;
    cout << "Enter the input string:- ";
	cin >> s;
    cout << "The longest repeating non-overlapping substring is:- " << longestRepeatedSubstring(s) << '\n'; 
    return 0; 
} 
/*
Sample Input 1:-
    Enter the input string:- opengenus
Sample Output 1:-
    The longest repeating non-overlapping substring is:- en
Sample Input 2:-
    Enter the input string:- aaaaaa
Sample Output 2:-
    The longest repeating non-overlapping substring is:- aaa
 */

Following is the implementation in Java:-

import java.util.*;
class nonRepeativeSubstring
 {
  public static String nonRepeativeSubstr(String str,int n)
  {
   int dp[][]=new int[n+1][n+1];
   int max=0,index=0;
   for(int i=1;i<=n;++i)
   {
    for(int j=i+1;j<=n;++j)
    {
     if(str.charAt(i-1)==str.charAt(j-1) && j-i>dp[i-1][j-1])
     {dp[i][j]=dp[i-1][j-1]+1;
     if(max<dp[i][j])
         {
           max=dp[i][j];
//save last index of substring
           index=Math.max(i,index);
         }
     }
     else
      dp[i][j]=0;
    }
   }
   return max>0?str.substring(index-max,index):"-1";
  }
 public static void main (String[] args)
  {
 Scanner scan=new Scanner(System.in);
 System.out.println("Enter the input string:-");
     String s=scan.nextLine();
     System.out.println("The longest repeating non-overlapping substring is:- "+nonRepeativeSubstr(s,s.length());
 }
  }
}
/*
Sample Input 1:-
    Enter the input string:- opengenus
Sample Output 1:-
    The longest repeating non-overlapping substring is:- en
Sample Input 2:-
    Enter the input string:- aaaaaa
Sample Output 2:-
    The longest repeating non-overlapping substring is:- aaa
*/

Following is the implementation in Python:

# Returns the longest repeating non-overlapping 
# substring in str 
def longestRepeatedSubstring(str): 
  
    n = len(str) 
    dp = [[0 for x in range(n + 1)]     # Initializing dp to 0
                for y in range(n + 1)] 
  
    ans = ""     # To store resultant substring 
    ans_len = 0 # To store length of result 
  
    # building table in bottom-up manner 
    ind = 0
    for i in range(1, n + 1): 
        for j in range(i + 1, n + 1): 
              
            # if the characters at (i-1)th and (j-1)th 
			# position matches and codition for
			# removing overlapping is satisfied
            if (str[i - 1] == str[j - 1] and dp[i - 1][j - 1] < (j - i)): 
                dp[i][j] = dp[i - 1][j - 1] + 1

            # updating maximum length of the 
            # substring and updating the finishing 
            # ind of the suffix 
            if (dp[i][j] > ans_len): 
                ans_len = dp[i][j] 
                ind = max(i, ind) 
                  
            else: 
                dp[i][j] = 0
  
    # If we have non-empty result, then insert  
    # all characters from first character to  
    # last character of string 
    if (ans_len > 0): 
        for i in range(ind - ans_len + 1,ind + 1): 
            ans = ans + str[i - 1] 
  
    return ans 
  
# Driver Code 
if __name__ == "__main__": 
      
    str=input("Enter the input string: ")
    print("The longest repeating non-overlapping substring is:- ",longestRepeatedSubstring(str)) 
#Sample Input 1:-
#    Enter the input string:- opengenus
#Sample Output 1:-
#    The longest repeating non-overlapping substring is:- en
#Sample Input 2:-
#    Enter the input string:- aaaaaa
#Sample Output 2:-
#    The longest repeating non-overlapping substring is:- aaa

Complexity

  • Worst case time complexity: O(n2)
  • Average case time complexity: O(n2)
  • Best case time complexity: O(n2)
  • Space complexity: O(n2)

Building a 2-D table requires two for loops hence the time-complexity of this algorithm will be O(n2).Also making a 2-D table would require n2 space hence the space complexity of this algorithm will also be O(n2).

With this, you have the complete knowledge of this problem. Enjoy.