Number of substrings with exactly k distinct characters

Binary Tree Problems books

Get FREE domain for 1st year and build your brand new site

In this OpenGenus article, we will be solving Find number of substrings with exactly k distinct characters problem. In this problem, we are given a string and our job is to count the total number of substrings of the given string that contains exactly k distinct characters. If no substring exists that contains exactly k distinct characters then return -1.

For example:

Input: xyz, k = 2
Output: 2
substrings are {"xy", "yz"}

Input: xyx, k = 2
Output: 3
substrings are {"xy", "yx", "xyx"}

Input: zz, k = 1
Output: 3
substrings are {"z", "z", "zz"}

We have explored three approaches:

  • Brute force approach O(N^3) time
  • Improved Brute force approach O(N^2) time
  • Efficient approach O(N) time

The efficient approach brings in ideas of Dynamic Programming and Sliding Window Approach.

Brute Force approach

The brute force method is to generate all the possible substrings and check if distinct characters in these substrings are equal to k. If it is equal to k then increment the count by 1.

It will take O(n * n) time to generate the substrings and O(n) to check if distinct characters in these substrings are equal to k. Therefore, the time complexity of this approach is O( n * n * n) that is O(N^3).

Optimizing Brute Force approach

In this approach, we will use a hash table to optimize the above approach. We will maintain a hash table while generating substrings and check distinct characters in the substring with the help of a hash table.

Algorithm

The steps are:

Step 1: We will use two nested for loops to generate all the substrings
Step 2: The outer loop loops from i equal to 0 to string length minus 1.
Step 3: The inner loop loops from j equal to i to string length minus 1.
Step 4: We will create a count variable that initially has a value of 0.
Step 5: Inside the outer loop, we will create a distinct_character variable that initially has a value of 0.
Step 6: We will create a frequency array of size 26 and set all the elements to zero.
Step 7: Inside the inner loop, we will check whether the frequency of the string[j] - 'a' is zero.
Step 8: If frequency of string[j] - 'a' is zero then increment distinct_character by 1.
Step 9: Increment the frequency of string[j] - 'a' by 1.
Step 10: If distinct_character is equal to k then increment the count by 1.
Step 11: Else if distinct_character is greater than k then break out of the inner loop.
Step 12: If the count is greater than zero then return count.
Step 13: else return -1.

Time and Space complexity.

  • The time complexity is O(n * n) because we are using two nested loops to generate all the possible substrings.
  • The space complexity is O(1) because the space used by the program is not dependent on the string length.

where n is the length of the given string.

Example.

The given string is xyzyxx
The value of k is 3.

The substrings starting with x
substring x.
The frequency of x is 0.
increment distinct character by 1
distinct character is equal to 1( 0 + 1)
increment frequency of x
frequency of x is equal to 1.

substring xy.
the frequency of y is 0.
increment distinct character by 1
distinct character is equal to 2( 1 + 1)
increment frequency of y
frequency of y is equal to 1.

substring xyz
the frequency of z is 0.
increment distinct character by 1
distinct character is equal to 3( 2 + 1)
increment frequency of z
frequency of z is equal to 1.
distinct character is equal to k therefore, increment count by 1.
count is equal to 1( 0 + 1 ).

substring xyzy
the frequency of y is 1.
increment frequency of y.
frequency of y is equal to 2.
distinct character is equal to k therefore, increment count by 1.
count is equal to 2( 1 + 1 ).

substring xyzyx
the frequency of x is 2.
increment frequency of x.
frequency of x is equal to 2.
distinct character is equal to k therefore, increment count by 1.
count is equal to 3( 2 + 1 ).

substring xyzyxx
the frequency of x is 2.
increment frequency of x.
frequency of x is equal to 3.
distinct character is equal to k therefore, increment count by 1.
count is equal to 4( 3 + 1 ).

The substrings starting with y.
substring y.
The frequency of y is 0.
increment distinct character by 1
distinct character is equal to 1( 0 + 1)
increment frequency of y
frequency of y is equal to 1.

substring yz.
The frequency of z is 0.
increment distinct character by 1
distinct character is equal to 2( 1 + 1)
increment frequency of z
frequency of z is equal to 1.

substring yzy.
The frequency of y is 1.
increment frequency of y
frequency of y is equal to 2.

substring yzyx.
The frequency of x is 0.
increment distinct character by 1
distinct character is equal to 3( 2 + 1)
increment frequency of x
frequency of x is equal to 1.
distinct character is equal to k therefore, increment count by 1.
count is equal to 5( 4 + 1 ).

substring yzyxx.
The frequency of x is 1.
increment frequency of x
frequency of x is equal to 2.
distinct character is equal to k therefore, increment count by 1.
count is equal to 6( 5 + 1 ).

The substrings starting with z.
substring z.
The frequency of z is 0.
increment distinct character by 1
distinct character is equal to 1( 0 + 1)
increment frequency of z
frequency of z is equal to 1.

substring zy
The frequency of y is 0.
increment distinct character by 1
distinct character is equal to 2( 1 + 1)
increment frequency of y
frequency of y is equal to 1.

substring zyx
The frequency of x is 0.
increment distinct character by 1
distinct character is equal to 3( 2 + 1)
increment frequency of x
frequency of x is equal to 1.
distinct character is equal to k therefore, increment count by 1.
count is equal to 7( 6 + 1 ).

substring zyxx
The frequency of x is 1.
increment frequency of x
frequency of x is equal to 2.
distinct character is equal to k therefore, increment count by 1.
count is equal to 8( 7 + 1 ).

The substrings starting with y
substring y.
The frequency of y is 0.
increment distinct character by 1
distinct character is equal to 1(0 + 1)
increment frequency of y
frequency of y is equal to 1.

substring yx.
The frequency of x is 0.
increment distinct character by 1
distinct character is equal to 2( 1 + 1)
increment frequency of x
frequency of x is equal to 1.

substring yxx
The frequency of x is 1.
increment frequency of x.
frequency of x is equal to 2.

The substrings starting with x.
substring x.
The frequency of x is 0.
increment distinct character by 1.
distinct character is equal to 1( 0 + 1)
increment frequency of x.
frequency of x is equal to 1.

substring xx.
The frequency of x is 1.
increment frequency of x.
frequency of x is equal to 2.

The substrings starting with x.
substring x.
The frequency of x is 0.
increment distinct character by 1.
distinct character is equal to 1( 0 + 1)
increment frequency of x.
frequency of x is equal to 1.

After all possible substrings, the value of count is equal to 8.
So there are 8 substrings with exactly 3 distinct characters.

Implementing the above algorithm.

  • C++

C++

#include <iostream>
using namespace std;

/*function that returns the total no of substrings with exactly k distinct characters*/ int substring_with_k_characters(string str, int k) { int n = str.length(); //Initialize count int count = 0;
// To store the frequency of characters from 'a' to 'z' int frequency[26];
/* Consider all substrings beginning with str[i]*/ for (int i = 0; i < n; i++) { int distinct_character = 0;
for(int j=0;j< 26;j++){ frequency[j]=0; } for (int j=i; j<n; j++) { if (frequency[str[j] - 'a'] == 0){ distinct_character++; } // Increment frequency of the current character frequency[str[j] - 'a']++;
/* If distinct character count becomes k,then increment count.*/ if (distinct_character == k){ count++; } if(distinct_character > k){ break; } } } return count; }
// main function int main() { string str = "xyzyxx"; int k = 3; cout << "Total substrings with exactly " << k << " distinct characters: " << substring_with_k_characters(str,k) << endl; return 0; }

Output.

Total substrings with exactly 3 distinct characters: 8

Efficient approach

In this approach, we will count the substrings with atmost k distinct characters and substrings with atmost k-1 distinct characters. The count of substrings with exactly k distinct characters is equal to count of substrings with atmost k distinct characters minus count of substrings with atmost k-1 characters.

Key idea:

# substrings with k distinct characters = 
       # substrings with atmost k distinct characters - 
       # substrings with atmost k-1 characters

How to count substrings with atmost k distinct characters?

We will be using map and sliding window technique to count the substrings with atmost k distinct characters.

Algorithm

The steps are as follows:

Step 1: Create a map called frequency.
Step 2: Create variables called j and count and initialize them to 0.
Step 3: We will traverse the string from left to right using variable i.
Step 4: We will increment the frequency of the current character by 1. (frequency[string[i]++).
Step 5: While frequency size is greater than k and j is less than or equal to i decrement the frequency[string[jj] by 1. If the frequency is equal to 0 after decrementing then erase that frequency. Increment j by 1.
Step 6: After while loop add i - j + 1 to count.
Step 7: Return count.

Time and Space complexity.

  • The time complexity is O(n) because we are traversing each element in the string at most 2 times.
  • The space complexity is O(1) because the space used does not depend on the input length.

Example.

The given string is xyzyxx
The value of k is 3.
The value of j is 0.

Total sustrings with atmost 3 distinct characters.
Traverse the string from left to right.
the value of i is 0.
The current character is x.
frequency[x] is equal to 1(0 + 1)
frequency size is 1 and is not greater then 3 so skip the while loop.
count is equal to count + i - j +1
count=0 + 0 - 0 + 1.
count=1.

The value of i is 1
The current character is y.
frequency[y] is equal to 1(0 + 1)
frequency size is 2 and is not greater then 3 so skip the while loop.
count is equal to count + i - j +1
count=1 + 1 - 0 + 1.
count=3.

The value of i is 2
The current character is z.
frequency[z] is equal to 1(0 + 1)
frequency size is 3 and is not greater then 3 so skip the while loop.
count is equal to count + i - j +1
count=3 + 2 - 0 + 1.
count=6.

The value of i is 3
The current character is y.
frequency[y] is equal to 2(1 + 1)
frequency size is 3 and is not greater then 3 so skip the while loop.
count is equal to count + i - j +1
count=6 + 3 - 0 + 1.
count=10.

The value of i is 4.
The current character is x.
frequency[x] is equal to 2(1 + 1)
frequency size is 3 and is not greater then 3 so skip the while loop.
count is equal to count + i - j +1
count=10 + 4 - 0 + 1.
count=15.

The value of i is 5.
The current character is x.
frequency[x] is equal to 3(2 + 1)
frequency size is 3 and is not greater then 3 so skip the while loop.
count is equal to count + i - j +1
count=15 + 5 - 0 + 1.
count=21.

Total substrings with atmost 3 distinct characters are 21.

Similarly we will find substrings with atmost 2 distinct characters.
Traverse the string from left to right.
the value of i is 0.
The current character is x.
frequency[x] is equal to 1(0 + 1)
frequency size is 1 and is not greater then 2 so skip the while loop.
count is equal to count + i - j +1
count=0 + 0 - 0 + 1.
count=1.

The value of i is 1
The current character is y.
frequency[y] is equal to 1(0 + 1)
frequency size is 2 and is not greater then 2 so skip the while loop.
count is equal to count + i - j +1
count=1 + 1 - 0 + 1.
count=3.

The value of i is 2
The current character is z.
frequency[z] is equal to 1(0 + 1)
frequency size is 3 and is greater then 2
so decrement the frequency[x] by 1.
frequency[x] is equal to 0(1 - 0)
Erase x from map.
increment j by 1.
j is equal to 1(0 + 1)
frequency size is 2 and is not greater than 2 so break out of while loop.
count is equal to count + i - j + 1.
count= 3 + 2 - 1 + 1.
count=5.

The value of i is 3.
The current character is y.
frequency[y] is equal to 2(1 + 1).
frequency size is 2 and is not greater then 2 so skip the while loop.
count is equal to count + i - j +1
count=5 + 3 - 1 + 1.
count=8.

The value of i is 4.
The current character is x.
frequency[x] is equal to 1(0 + 1).
frequency size is 3 and is greater then 2
so decrement the frequency[y] by 1.
frequency[x] is equal to 1(2 - 1)
increment j by 1.
j is equal to 2(1 + 1)
frequency size is 3 and is greater then 2
so decrement the frequency[z] by 1.
frequency[z] is equal to 0(1 - 1)
Erase z from map.
increment j by 1.
j is equal to 3(2 + 1)
frequency size is 2 and is not greater then 2 so skip the while loop.
count is equal to count + i - j +1
count=8 + 4 - 3 + 1.
count=10.

The value of i is 5.
The current character is x.
frequency[x] is equal to 2(1 + 1)
frequency size is 2 and is not greater then 2 so skip the while loop.
count is equal to count + i - j +1
count=10 + 5 - 3 + 1.
count=13.

Total substrings with atmost two distinct characters are 13.

Total substrings with exactly three distinct characters are
Total substrings with atmost three distinct characters - Total substrings with atmost
two distinct characters
21 - 13.
8.

Total substrings with exactly three distinct character is 8.

Implementing efficient approach.

  • C++

C++

#include <iostream>
#include <map>
using namespace std;

int substring_with_atmost_k_characters(string s , int k){ if( k == 0){ return 0; } map<int,int> frequency; int n = s.length();//length of string int j=0; int count = 0; for(int i=0;i<n;i++){ frequency[s[i]]++; while(frequency.size() > k and j <= i){ int p = frequency[s[j]]--; if(p == 1){ frequency.erase(s[j]); } j++; } count += (i-j+1); } return count; }
// main function int main() { string str = "xyzyxx"; int k = 3; cout << "Total substrings with exactly " << k << " distinct characters: " << substring_with_atmost_k_characters(str,k) - substring_with_atmost_k_characters(str,k-1) << endl; return 0; }

Output.

Total substrings with exactly 3 distinct characters: 8

Hence, you have the complete knowledge to find the Number of substrings with exactly k distinct characters in linear time O(N).