Minimum deletions to make frequency of each character unique

Open-Source Internship opportunity by OpenGenus for programmers. Apply now.

Reading time: 25 minutes | Coding time: 10 minutes

The problem we will solve is that given a string s, we need to find the minimum number of characters to be deleted from s so that the frequency of each character in the string is unique.

The overall time complexity of our efficient approach will be O(N + M^2) where N is the number of characters in the string and M is the number of characters in the character set (26 for English). The space complexity will be O(M).

Problem

You are given a string s. You have to find the minimum number of characters to be deleted so that frequency of each character becomes unique.

Note that all characters in the string are lowercase.

Given string s = 'ccdacdd'
Note the frequency of characters : { c:3, a:1, d:3 }
Now, as c and d both have same frequency 3, we need to delete one character from c or d, to make their frequency different. After deleting any character from c or d we will get frequency as 1,2 and 3, as they all are different we got our solution as 1.

Approach

First of all, we need to count the frequency of all dictinct characters present in our string. Then we store each frequency corresponding to number of characters having that frequency, in a dictionary(python).

After that we traverse the dictionary from higher frequency to lower and if we find a frequency that has more than one character with it, we delete one occurence of the character and increment the number of character corresponding to frequency one less. This process is repeated till we check all the frquencies.

Pseudocode

Create a list of length 26 with elements as 0.
Traverse the given string and increment elements of list having position as ASCII code of character - ASCII code of (a).
Create a dictionary and traverse the list storing each frequency which is greater than 0.
If any frequency repeats, we increment the value corresponding to it by 1.
Then we sort our dictionary in a reversed order and start traversing.
If we find a frequency having more than one character associated with it, we start decrementing the number of characters by 1 and incrementing the number of characters associated with frequency one less, also we increment our deletion variable by 1.
Return the deletion variable when we finished traversing.

Implementation in python

def find_answer(s):

    # stores frequency of each character
    frequency_list = [0] * 26

    for i in range(len(s)):
        frequency_list[ord(s[i]) - ord('a')] += 1

    # stores each frequency and corresponding number of characters
    frequency_count = dict() 

    for i in range(26):
        if (frequency_list[i] != 0):
            frequency_count[frequency_list[i]] = frequency_count.get(frequency_list[i], 0) + 1

    # number of deletions required
    deletions = 0
    
    # Iterate dictionary from higher frequency to lower
    freq = list(frequency_count.keys())
    freq.sort(reverse=True)
    for i in freq:
        if (i == 0):
            break
        characters = frequency_count[i]
        while (x > 1):
            deletions += 1
            x -= 1
            if (i - 1) in frequency_count.keys():
                frequency_count[i - 1] += 1

    return deletions 

s = "opengenus"
print(find_answer(s))

Workflow of code

suppose we have a string s = 'opengenus', we pass it to our find_answer function
Then we create a list of length 26 and we traverse through our string, and based on the ASCII code of our character we increment elements of our list by 1. After we traverse through 'opengenus'. our list will loot like : [0, 0, 0, 0, 2, 0, 1, 0, 0, 0, 0, 0, 0, 2, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0] showing frequency of characters in string.
Then we traverse through our list and create a dictionary storing each dictinct frequency and number of characters having that frequency. Our dictionary will look like : {2: 2, 1: 5} means there are 5 characters which have frequency 1 and 2 characters having frequency 2.
Now we make a list of frequencies and sort it in reverse order : [2, 1]
Then we iterate through our list and if the frequency have more than 1 character referring to it, we increment our deletion variable and number of characters of referring to frequency-1. Like in our case, we see that frequency 2 has 2 characters referring it, we decrement 2 by 1 and increment number of characters having frequency 1, so our dictionary looks like : {2: 1, 1: 6}. So we need to delete 1 character in case of frequency 2.
Now the same is repeated with next frequency, means frequency 1, till that 6 reaches 1, so we need to delete 5 characters in case of frequency 1.
So, our final deletion variable will have value 6 and we will return that as our answer. So we need to delete at least 6 characters from 'opengenus', then we will get all characters having unique frequency.

Complexity

To count the frequency of all characters in given string, we traverse through whole string, so time complexity for this is O(n) where n is length of string.
To create the frquency_count dictionary,time complexity is O(m), where m=26.
To delete a character, our time complexity goes as O(m * m).
So, our overall time complexity will be O(n + m * m).
Our space complexity here will be O(m) where m = 26, as the list and the dictionary we created will have maximum length of 26.

With this, you will have the complete knowledge of this wonderful algorithm. Enjoy.