Longest Substring with at Least K repeating characters

Internship at OpenGenus

Get this book -> Problems on Array: For Interviews and Competitive Programming

In this article, we have presented two algorithms to find the Longest Substring with at Least K repeating characters.

Table of contents:

  1. Problem Statement
  2. Approach 1
  3. Approach 2

Problem Statement

Before we dive into the problem, lets understand the problem statement more clearly:

Given a string s and an integer k, find the length of the longest substring such that the frequency of each character in this substring is greater than or equal to k

That sounds easy enough right? Lets look at a few examples:

s = "xyxyxz"
k = 2

Here the substring would be "xyxyx", and the length is 5.

Lets look at another example:

s = "blame"
k = 3

Here there is no such substring so the the answer would be 0.

Approach 1

Lets now move ahead with the solution.

So the naive approach is very simple. All you have to do is generate all the substrings of the given string, iterate over each string to find out if it has atleast k repeating characters, and find the longest string among them.

Lets code out this approach

def check(string, k):

    # We need to find if the characters of the given string
    # have a frequency of atleast k
    
    fre = {} # For character frequencies
    
    # Compute the frequencies of all characters
    for char in string:
        if char not in fre: fre[char] = 1
        else: fre[char] += 1
    
    # Check if any character has frequency less than k
    for char in fre:
        if fre[char]<k:
            return False
    
    # If all characters satisify
    return True
    

def longestSubstring(s, k):

    # Let us generate all the substrings and store them in a list
    
    substrings = []
    for i in range(len(s)):
        for j in range(i+1,len(s)+1):
            substring = s[i:j]
            substrings.append(substring)
    
    # Now that we have all the substrings
    # We have to check if a given substring satisfies the condition
    
    ans = 0 # Result substring length
    
    for sub in substrings:
        # Check function to see if substring satisfies
        if check(sub, k) == True:
            # If it satisfies, update the ans value
            ans = max(len(sub), ans)
    
    return ans
        
    
# ------ Code begins here ----------
s = input()
k = int(input())
print(longestSubstring(s,k))

So the approach is fail proof - it works for any string that we give. We first compute all the substrings of that string and we check for each substring if all the characters in it have a frequency greater than or equal to K. And everytime we find such a string, we check if its the longest string we found, keeping track of the lengths with the ans variable.

Time and Space Complexity:

This approach looks great doesn't it? It solves the problem perfectly, doesn't use any extra space. The program needs O(N2) time to find the solution, since the generation of the substrings would be taking N2 time.

Approach 2

Let us try to increase the speed a little bit, lets see if we can try a few optimisations.

Well, to begin with, we dont need to generate each and every substring. Let me explain more elaborately. Lets say we have a string S, and we were given the value of K as 3. Now, there would some some character char in S, whose frequency is less than 3, so there is no point in checking all the substrings that contain char.

A way to approach this is by using Divide-and-Conquer strategy. The intuition is pretty simple, we create a map of all the characters and frequencies. If a character's frequency is less then k, we break the string into substrings at the indices of the character and recursively run our algorithm over these substrings. Or, if none of the characters have a frequency less than k, that would mean that we found the required string - and we check its length to see if its the longest one.

Let us look at the code

def longestSubstring(s, k, ans):
    
    # Lets find the frequency of the characters
    freq = {}
    for i in s:
        freq[i] = freq.get(i,0)+1
        
    # A flag variable to see if any character has freq less than k
    flag = False
    
    # Iterate through the string
    for i in s:
        if freq[i]<k:
            # Character with freq<k is found
            flag = True
            
            # split() function is to break the string at that particular character
            new = s.split(i)
            for st in new:
                # We now recursively check for each substring
                x = longestSubstring(st,k,ans)
                ans = max(ans,x)
            break
    if not flag:
        # If all the char have freq>=k
        ans = max(ans,len(s))
    return ans
        
    
# ------ Code begins here ----------
s = input()
k = int(input())
print(longestSubstring(s,k,0))

Time and Space Complexity:

The recursive approach still takes O(N2) time to find the solution in the worst case scenario. It takes O(N) for the maximum depth of the recursive call and O(N) to build a frequency map for each substring.

It still improves the runtime in many cases, we can avoid redundancies with this approach, although there is a slight giveaway - we would be needing an extra O(N) space to implement this, since the recursion stack needs N space.

With this article at OpenGenus, you must have the complete knowledge of how to find the Longest Substring with at Least K repeating characters.