In this article, we have explained the approach to find the Number of distinct substrings of length K using Rolling hash technique, hash table and brute force approach.
Table of content:
- Problem Statement
- Method 1 - Brute Force
- Method 2 - Using hash table
- Method 3 - Rolling hash algorithm
We will dive into the problem "Number of distinct substrings of length K" now.
Given a string of lowercase alphabets and an integer k as input, print the count of all possible substrings which has exactly k distinct characters.
Enter the string: abcd Enter k value: 1 Total substrings with exactly 1 distinct characters : 4 Enter the string: mqtt Enter k value: 2 Total substrings with exactly 2 distinct characters : 3 Enter the string: aaaa Enter k value: 2 Total substrings with exactly 2 distinct characters : 0
Method 1 - Brute Force
Assuming that the length of string is n, then there can be n(n+1)/2 possible substrings*.A simple and direct method would be to generate all of then and check whether it has exactly k unique characters or not.
Looping through and generating all the combinations and then checking for the distinct characters is costly and results in higher complexity.
n(n+1)/2 = (n^2+1)/2 = O(n^2)
So, the order of complexity to generate string would be O(n^2).
Next step is to check the each sustring which will take O(n),making overall complexity O(n^3).
Space complexity - O(n)
Method 2 - Using hash table
Consider the string contains only smallcase alphabets. Main concept is to maintain the hashtable for the substring and check for no. of unique characters.
What is HashTable?It is a data structure that implements an an array data type in which each one is associated with the value i.e it has key and value. It is also called as hashmap.
- First,Initialize the required variables for the computation in the program like count and an array to store characters.
- Loop throughtout to find the substrings and further the number of distinct characters in it.
- Finally,return the result.
def count_k_dist(input_str, k): n = len(input_str) # Initializing res to 0,return after res after counting res = 0 # To store count of characters from # 'a' to 'z' cnt =  * 27 # For considering all substring beginning with input_str[i] for i in range(0, n): dist_count = 0 # Initializing array with 0 cnt =  * 27 # Considering all substrings between str[i..j] for j in range(i, n): # If this is a new character for this # substring, increment dist_count. # ord(char) gives the ASCII value if cnt[ord(input_str[j]) - 97] == 0: dist_count += 1 # Increment count of current character cnt[ord(input_str[j]) - 97] += 1 # If distinct character count becomes k, # then increment result. if dist_count == k: res += 1 if dist_count > k: break return res if __name__ == "__main__": str1 = input("Enter the string: ") k = int(input("Enter k value: ")) print("Total substrings with exactly", k, "distinct characters : ", end="") print(count_k_dist(str1, k))
Time Complexity : We have used two nested for loops, which gives the time complexity as O(n * n)
Space Complexity : O(n)
Method 3 - Rolling hash algorithm
A rolling hash (also known as recursive hashing or rolling checksum) is a hash function where the input is hashed in a window that moves through the input.
- First, find the hash value of first sub-string of length ‘l’.
- With the help of it, generate hashes of all substrings and push them in an unordered set.
- Count number of distinct values in the set.
- Since there are 26 characters intialize x with 26.
- Find the hash value and add it to the unordered set(elements are unique and not ordered).
- Using the hash value find the number of distinct substrings.
# Python3 implementation of x = 26 mod = 3001 # Function to find the required count def count_substring(s, l): # Variable to the hash_val hash_val = 0 # Finding hash_val of substring # (0, l-1) using random number x for i in range(l): hash_val = (hash_val * x + (ord(s[i]) - 97)) % mod # Computing x^(l-1) pow_l = 1 for i in range(l - 1): pow_l = (pow_l * x) % mod # Unordered set to add hash_val values result = set() result.add(hash_val) # Generating all possible hash_val for i in range(l, len(s)): hash_val = ((hash_val - pow_l * (ord(s[i - l]) - 97) + 2 * mod) * x + (ord(s[i]) - 97)) % mod result.add(hash_val) # Print the result print(len(result)) # Driver Code if __name__ == "__main__": str1 = input("Enter the string:") k = int(input("Enter k value:")) count_substring(str1, k)
Time Complexity : O(n)
Space Complexity: O(n)
Hope you enjoyed solving this problem and learnt how to solve it using this article at OpenGenus.
Have a good day and keep learning.