Minimum insertions to make the frequency of each character unique


Reading time: 30 minutes | Coding time: 10 minutes

This problem is similar to 'Minimum deletions to make frequency of each character unique'. Checkout that article as well to get complete knowledge about this problem.

The problem we will solve is that given a string s, we need to find the minimum number of characters to be inserted in s so that the frequency of each character in the string is unique.

The overall time complexity of our efficient approach will be O(N + M^2) where N is the number of characters in the string and M is the number of characters in the character set (26 for English). The space complexity will be O(M).

Problem:

You are given a string s. You have to find the minimum number of characters to be inserted so that frequency of each character becomes unique.

Note that all characters in the string are lowercase.

Given string s = 'ccdacdd'

Note the frequency of characters : { c:3, a:1, d:3 }
Now, as c and d both have same frequency 3, we need to insert one character, either c or d, to make their frequency different. After inserting any character either c or d we will get frequency as 1,3 and 4, as they all are different we got our solution as 1.

Approach

First of all, we need to count the frequency of all dictinct characters present in our string. Then we store each frequency corresponding to number of characters having that frequency, in a dictionary(python).

After that we traverse the dictionary from lower frequency to higher and if we find a frequency that has more than one character with it, we insert one occurence of the character and increment the number of character corresponding to frequency one more. This process is repeated till we check all the frquencies.

Pseudocode

  • Create a list of length 26 with elements as 0.
  • Traverse the given string and increment elements of list having position as ASCII code of character - ASCII code of (a).
  • Create a dictionary and traverse the list storing each frequency which is greater than 0.
  • If any frequency repeats, we increment the value corresponding to it by 1.
  • Then we sort our dictionary and start traversing.
  • If we find a frequency having more than one character associated with it, we start incrementing the number of characters by 1 and incrementing the number of characters associated with frequency one more, also we increment our insertion variable by 1.
  • Return the insertion variable when we finished traversing.

Implementation in Python

Following is the implementation in Python:

def find_answer(s):

    # stores frequency of each character
    frequency_list = [0] * 26

    for i in range(len(s)):
        frequency_list[ord(s[i]) - ord('a')] += 1

    # stores each frequency and corresponding number of characters
    frequency_count = dict() 

    for i in range(26):
        if (frequency_list[i] != 0):
            frequency_count[frequency_list[i]] = frequency_count.get(frequency_list[i], 0) + 1

    # number of deletions required
    insertions = 0
    
    # Iterate dictionary from higher frequency to lower
    freq = list(frequency_count.keys())
    freq.sort()
    for i in freq:
        if (i == 0):
            break
        x = frequency_count[i]
        while (x > 1):
            insertions += 1
            x -= 1
            LetterFreqMap[it] -= 1
            if (it + 1) in LetterFreqMap.keys():
                LetterFreqMap[it + 1] += 1
            else:
                LetterFreqMap[it + 1] = 1
                a.append(it+1)

    return insertions 

s = "tests"
print(find_answer(s))

Workflow of code

  • suppose we have a string s = 'testsggg', we pass it to our find_answer function.
  • Then we create a list of length 26 and we traverse through our string, and based on the ASCII code of our character we increment elements of our list by 1. After we traverse through 'testsggg'. our list will loot like : [0, 0, 0, 0, 1, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 0] showing frequency of characters in string.
  • Then we traverse through our list and create a dictionary storing each dictinct frequency and number of characters having that frequency. Our dictionary will look like : {2: 2, 1: 1, 3: 1} means there are 2 characters which have frequency 2 and 1 characters having frequency 1 and 3.
  • Now we make a list of frequencies and sort it : [1, 2, 3]
  • Then we iterate through our list and if the frequency have more than 1 character referring to it, we increment our insertion variable and number of characters of referring to frequency+1. Like in our case, we see that frequency 2 has 2 characters referring it, we decrement 2 by 1 and increment number of characters having frequency 3 if available, so our dictionary looks like : {2: 1, 1: 1, 3: 2}. So we need to insert 1 character in case of frequency 2.
  • Now the same is repeated with next frequency, means frequency 3, till that 2 reaches 1, so we need to insert 1 character again in case of frequency 3.
  • So, our final deletion variable will have value 2 and we will return that as our answer. So we need to insert atleast 2 characters in 'testsggg', then we will get all characters having unique frequency.

Complexity

  • To count the frequency of all characters in given string, we traverse through whole string, so time complexity for this is O(n) where n is length of string.
  • To create the frquency_count dictionary,time complexity is O(m), where m=26.
  • To delete a character, our time complexity goes as O(m * m).
  • So, our overall time complexity will be O(n + m * m).
  • Our space complexity here will be O(m) where m = 26, as the list and the dictionary we created will have maximum length of 26.

With this, you will have complete idea of this problem and will be able to solve it efficiently. Enjoy.