Deleting Duplicate Characters of String

Internship at OpenGenus

Get FREE domain for 1st year and build your brand new site

We have explored the problem of Deleting Duplicate Characters of String such that the resulting string is lexicographically the smallest among all possibilities.

Table of Contents:

  1. Problem
  2. Lexicographical order
  3. Example
  4. Observations
  5. Solution 1
  6. Implementation
  7. Complexity
  8. Solution 2
  9. Implementation2
  10. Complexity2

Problem Statement

Given a string, our goal is to remove duplicate letters such that in the resulting string every letter appears exactly once and is in smallest lexicographical order of all possible results.

Example:

Input: "bcabc"
Output: "abc"

At this point it is likley you are wondering, "What on earth is the smallest
lexicographical ordering of all possible results?". This is best explained by
a quick definition and example.

Lexicographical order:

Lexicographic ordering is essentially a generalization of the alphabetical
ordering of some sequence of characters relative to some underlying totally
ordered set (in this case the alphabet that is used to write the string). In
other words, determining the larger string depends on comparing the first
unequal correpsonding character in the string in comparison to the underlying
alphabet.

Example:

The following strings are given in lexicographic order:
"Amazon"
"Breach"
"Break"
"Broke"

Note: The ordering of the strings is independent of the length of the strings, and only depends on the occurence of the smallest character first in corresponding characters across strings. Also note that the only way for two strings to have equal lexicographic ranking is if they have the same characters in the same positions (i.e equal strings)

Observations:

It is helpful to make general observations about such a problem in order to
motivate potential solutions.

  1. The first key observation is that the leftmost letter in the resulting
    string will be the smallest letter such that its suffix contains every other
    letter. This is derived from the fact that each solution has every letter and
    each solution is given in lexicographically smallest order.

  2. When iterating over the string, if the character at i is larger than the
    character at i + 1 AND the character at i occurs later in the string, then it is ALWAYS optimal to delete the character at i.

Note that both of the above observations describe making the best decision at
each step, therefore it is likely that we want our algorithm to be greedy. The above obersvations give rise to namley 2 potential solutions.

Approach

Using the two obeservations above, we can attempt to generate a general approach to solving the problem. A good place to start is to try and find a general theme amongst the two observations above. Note that for each of the observations, we first look at the character in the current (left-most) position, and then make the optimal decision. This indicates that the algorithm we are looking for must be greedy.

Solution 1

Note that observation #1 gives rise to a recursive approach. At each step in
the recursion, we can determine the smallest character such that its suffix
contains at least 1 copy of each letter in the original string. We then simply recurse on the suffix.

Steps:

  1. Loop through string to find smallest letter who's suffix contains every other letter. Recall that we want the resulting string to be in smallest lexicographic order, have no duplicates, and have all of the original letters. Choosing the smallest letter at each recursive call takes care of the lexicographic requirment. Checking that every other letter is in the suffix ensures that all of the original letters will be in the output string.
  2. Append smallest to output string. Note that we can append this letter to the output right away using Observation 1 as stated above.
  3. Recruse on the suffix: Note that recursing on the suffix will find the next smallest letter we can add with the same requirments, therefore going through all letters builds the optimal string.

Pseudocode

Let Input = s
Let hm = hashmap of s of the form {key=char: val=count}
#### Implementation
RemoveDups(s)
    smallest <- 0
    for char in smallest:
        if char < smallest:
            smallest <- i
        hm[s[i]] <- hm[s[i]] - 1
        if hm[s[i]] == 0:
            break
    return s[smallest] + removeDups(s[smallest : end])

Implementaion

from collections import Counter
class Solution1:
    def removeDups(self, s: str) -> str:
        count = Counter(s)
        smallest = 0
        for i, char in enumerate(s):
            if s[i] < s[smallest]:
                smallest = i
            count[s[i]] -= 1
            if count[s[i]] == 0:
                break
        return s[smallest] + self.removeDups(s[smallest:]
                .replace(s[smallest], "")) if s else ''

Complexity:

Note that since we are iterating over the string at each recursive step, each
call takes O(n). Since the longest non-duplicate string we can have is 26
(letters of the Latin alphabet), the number of recursive calls is bounded by
26. Therefore, Runtime = C * O(n) where C is the constant 26.

The reasoning for space complexity is similar, therfore ==Space = C * O(n)
= O(n)

Solution 2:

Note that observation #2 gives rise to another potential solution. Yet again it involves making the optimal solution at each step in the algoirithm, therefore we are looking for another greedy algorithm. In this solution, we keep a stack that stores our final answer. At each iteration, we pop as many characters as we can from the stack and add the current character. For a character to be deleted, it must be larger than the current character and it must occur later in the string.

Steps:

Let t be the char currently on the top of the stack. Let c be the char we are currently comparing.
For each char in Input:

  1. Check if c < t and t occurs somewhere else in the string. If yes, we can pop t, as it is garunteed to not be in the final string (by observation 2). Note that this is somewhat the opposite of step 1 from Solution 1. In this case we identify the letter that we know NOT to be in optimal string, and therefore we can greedily discard it. Note that we can only safley discard of a letter if it occurs later in the string. This ensures that the final output string still contains all of the original letters.
  2. Repeat 1 until c > t: Note that at this point, t should occur before c in the string since it is smaller, therefore we push c onto the top of the stack and it becomes t.
  3. push c onto stack
  4. return stack

Pseudocode

for char in input:
    if char not seen:
        while stack and c < t and t occurs_later:
            seen.discard(stack.pop())
        seen.add(c)
        stack.push(c)
  return stack as string
       

#### Implementation2:

``` python

class Solution2:
    def removeDups(self, s: str) -> str: 
        stack = []
        # keeps track of elements in solution
        seen = set()
        # checks if there are any more occurences element at i
        is_more = {c: i for i, c in enumerate(s)}

        for i, c in enumerate(s):
            if c not in seen:
                while stack and c < stack[-1] and i < is_more[stack[-1]]:
                    seen.discard(stack.pop())
                seen.add(c)
                stack.append(c)
        return ''.join(stack)
        
Complexity 2:

Note that the inner loop is bounded by the number of elements in the stack at current therefore the total Runtime = O(n)

Note that seen is a set therefore it will contain only unique elements. This means it is bounded by the number of characters in the alphabet, or a constant C. Therefore Space = O(1)

With this article at OpenGenus, you must have the complete idea of Deleting Duplicate Characters of String. Enjoy.