Number of distinct substrings of length K

Internship at OpenGenus

Get FREE domain for 1st year and build your brand new site

In this article, we have explained the approach to find the Number of distinct substrings of length K using Rolling hash technique, hash table and brute force approach.

Table of content:

  1. Problem Statement
  2. Method 1 - Brute Force
  3. Method 2 - Using hash table
  4. Method 3 - Rolling hash algorithm

We will dive into the problem "Number of distinct substrings of length K" now.

Problem Statement

Given a string of lowercase alphabets and an integer k as input, print the count of all possible substrings which has exactly k distinct characters.

Enter the string: abcd
Enter k value: 1
Total substrings with exactly 1 distinct characters : 4

Enter the string: mqtt
Enter k value: 2
Total substrings with exactly 2 distinct characters : 3

Enter the string: aaaa
Enter k value: 2
Total substrings with exactly 2 distinct characters : 0

Method 1 - Brute Force

Assuming that the length of string is n, then there can be n(n+1)/2 possible substrings*.A simple and direct method would be to generate all of then and check whether it has exactly k unique characters or not.

Looping through and generating all the combinations and then checking for the distinct characters is costly and results in higher complexity.

n(n+1)/2 = (n^2+1)/2 = O(n^2)

So, the order of complexity to generate string would be O(n^2).
Next step is to check the each sustring which will take O(n),making overall complexity O(n^3).
Space complexity - O(n)

Method 2 - Using hash table

Consider the string contains only smallcase alphabets. Main concept is to maintain the hashtable for the substring and check for no. of unique characters.

What is HashTable?It is a data structure that implements an an array data type in which each one is associated with the value i.e it has key and value. It is also called as hashmap.

  • First,Initialize the required variables for the computation in the program like count and an array to store characters.
  • Loop throughtout to find the substrings and further the number of distinct characters in it.
  • Finally,return the result.
def count_k_dist(input_str, k):
    n = len(input_str)

    # Initializing res to 0,return after res after counting
    res = 0

    # To store count of characters from
    # 'a' to 'z'
    cnt = [0] * 27

    # For considering all substring beginning with input_str[i]
    for i in range(0, n):
        dist_count = 0

        # Initializing array with 0
        cnt = [0] * 27

        # Considering all substrings between str[i..j]
        for j in range(i, n):

            # If this is a new character for this
            # substring, increment dist_count.
            # ord(char) gives the ASCII value

            if cnt[ord(input_str[j]) - 97] == 0:
                dist_count += 1

            # Increment count of current character
            cnt[ord(input_str[j]) - 97] += 1

            # If distinct character count becomes k,
            # then increment result.
            if dist_count == k:
                res += 1
            if dist_count > k:
                break

    return res


if __name__ == "__main__":
    str1 = input("Enter the string: ")
    k = int(input("Enter k value: "))
    print("Total substrings with exactly", k,
          "distinct characters : ", end="")
    print(count_k_dist(str1, k))

Time Complexity : We have used two nested for loops, which gives the time complexity as O(n * n)
Space Complexity : O(n)

Method 3 - Rolling hash algorithm

A rolling hash (also known as recursive hashing or rolling checksum) is a hash function where the input is hashed in a window that moves through the input.

  • First, find the hash value of first sub-string of length ā€˜lā€™.
  • With the help of it, generate hashes of all substrings and push them in an unordered set.
  • Count number of distinct values in the set.
  • Since there are 26 characters intialize x with 26.
  • Find the hash value and add it to the unordered set(elements are unique and not ordered).
  • Using the hash value find the number of distinct substrings.
# Python3 implementation of
x = 26
mod = 3001


# Function to find the required count
def count_substring(s, l):
    # Variable to the hash_val
    hash_val = 0

    # Finding hash_val of substring
    # (0, l-1) using random number x
    
    for i in range(l):
        hash_val = (hash_val * x + (ord(s[i]) - 97)) % mod

    # Computing x^(l-1)
    pow_l = 1
    for i in range(l - 1):
        pow_l = (pow_l * x) % mod

    # Unordered set to add hash_val values
    result = set()
    result.add(hash_val)

    # Generating all possible hash_val 
    for i in range(l, len(s)):
        hash_val = ((hash_val - pow_l * (ord(s[i - l]) - 97) + 2 * mod) * x + (ord(s[i]) - 97)) % mod

        result.add(hash_val)

    # Print the result
    print(len(result))


# Driver Code
if __name__ == "__main__":
    str1 = input("Enter the string:")
    k = int(input("Enter k value:"))

    count_substring(str1, k)

Time Complexity : O(n)
Space Complexity: O(n)

Hope you enjoyed solving this problem and learnt how to solve it using this article at OpenGenus.
Have a good day and keep learning.