Longest substring without repeating characters

Free Linux Book

Get FREE domain for 1st year and build your brand new site

In this article, we have explained three approaches to the problem of finding Longest substring without repeating characters. This involves the use of Hash Map with the clever use of start and end index.

Table of contents:

  1. Problem statement: Longest substring without repeating characters
  2. Method 1: Brute Force
  3. Method 2: Using HashMap
  4. Method 3: HashMap with a trick
  5. Applications of this problem

Prerequisite: String Algorithms

Let us get started with Longest substring without repeating characters.

Problem statement: Longest substring without repeating characters

In this problem, we are given a string, and we need to find the longest substring without repeating characters in that given string.

For example, let's assume that we are given string ABCFDGFEGOJF. We look at subparts of this string, also known as substrings. For every possible substring, we need to check for repeating characters and make sure that we only look for that substring that is longest and has continuous unique characters. Being continuous implies that in a possible substring, we don't skip characters.

invalid-substrings

For example, substrings like BCFG or FEGJ are not valid substrings because they are not in a continuous order as per the original string, and some characters have been skipped out. Substrings like FDGF or FEGOJF are not valid as some characters are observed more than once.

longest-substring

Therefore, in this example, the only longest possible substring is ABCFDG, with a length of 6.

We can solve this problem using various approaches. A few of them are discussed below.

Method 1: Brute Force

A simple approach is to check all substrings possible from the given string using a nested for loop, and further check if these substrings have any duplicate characters or not. To execute this algorithm, implement the following steps:

  1. Create a variable to keep track of the longest length (longest_len in the below example) of a substring so far encountered. Likewise, also create a variable to store the longest substring (longest_str).

  2. Create a loop to traverse all characters of the given substring. Then create another string variable inside to keep track of each substring encountered up till now.

  3. Create another loop to traverse every character of this substring. Check to see if every character of this substring doesn't already exist using an if statement (implies unique character), and if so, add this character to the string variable created in the previous step. Now create a variable to keep track of the length of the current substring. Compare the length of the string previously and currently encountered (using max() function). Whichever value is greater, assign that to the longest_len variable.

  4. Likewise, using another if statement, nested inside the if statement in step 3, check the length of the previous string and current string, and if the current string's length is greater, assign this string value of the longest_str variable, which we created to store the longest substring encountered thus far. Finally, if we encounter duplicate characters, we use the break keyword to break out of the inner for loop and move to the next substring.

  5. In the end, we print the final longest substring and its length.

Implementation Example

The following program implements the algorithm discussed above.

# Part of iq.opengenus.org
def longest_substring(s):
    
    # variable to track longest length of a substring encountered thus far
    longest_len = 0
    
    # variable to track longest lengthed substring encountered thus far
    longest_str = ''
    
    # loop to traverse all characters of the given string
    for i in range(len(s)):
    
        # variable to track current substring
        current_str = ""
        
        # loop to traverse all elements in current substring
        for j in range(i, len(s)):
        
            # check if every character is unique 
            if s[j] not in current_str:
            
                # add unique continuous character to current substring 
                current_str += s[j]
                
                # another variable to track length of current substring
                curr_len = longest_len
                
                # check whether current substring's length is greater
                # than previous substring's length 
                # assign that value whichever is greater
               
                longest_len = max(longest_len, len(current_str))
                
                # if current substring's length is greater
                # assign this substring as the longest substring
                if longest_len > curr_len:
                    longest_str = current_str
            # break out of the loop incase of duplicate characters
            else:
                break
    # print the longest substring and its length
    print(f'The longest substring: {longest_str}, length: {longest_len}')

# example to test the above longest substring program
x = 'ACDBVWGVRWFRADS'
longest_substring(x)

The above program produces the output shown below.

The longest substring: ACDBVWG, length: 7

Complexity

  • Time: total number of substrings which can be created from a given string is given by (n*(n+1))/2. On top of that assume that each substring has n number of characters. Therefore, the total runtime comes out to be O(n^3). In the best case, considering the string itself is the only substring, then the runtime would be O(n^2).
  • Space: the space requirement will stay the same irrespective of the size of the input, in either the best or worst case, so the space complexity is O(1).

Method 2: Using HashMap

We can solve this problem using the dictionary data type (Hash Map) in Python. This increases the efficiency and runtime improves because of less look time in dictionary type. We first create an empty dictionary to keep track of every character as the key and their respective index as value in the dictionary.

Additionally, keep track of the longest substring length, longest substring itself and the starting index of each substring and loop over all characters of the given string simultaneously. All the while, we keep checking for the longest substring and its length at the end of the loop. To implement this algorithm, the following steps are required-

  1. Create an empty dictionary. Now create a variable to keep track of the longest length (longest_len in this example) and initialize it to 0 for now. Also create a variable to keep track of this longest substring and keep it an empty string for now (longest_str). Then create a variable to keep track of the starting index of the substring (start) and initialize it to -1 as the index starts at 0.

  2. Create a for loop to traverse all the elements of the given string. Using an if statement, check if each character of this string is in the dictionary and whether its value (the index) is greater than the start variable. If it is, it implies this character is already in the dictionary. So we set the starting index (start variable) to be equal to the value of this character. If this character is not in the dictionary, we directly set the dictionary key to be equal to this character and the dictionary value as the index of this character.

  3. Then using another if statement, we check if the current length of the substring (given by i-start) is greater than the length of the previous substring. If it's true, we set the longest substring to be equal to this current substring.

  4. Finally, we set the longest_len variable to store the value length of the current substring or previous substring, whichever is greater, using the max() function.

  5. Then we print the length of the longest substring and the substring itself.

Implementation Example

The code below implements the above algorithm.

# Part of iq.opengenus.org
def longest_Substring(s):
    # keep track of every char using dictionary
    dictn = dict()
    longest_len = 0
    longest_str = ''

    # indicates starting position of substring
    # initially we set it to -1
    start = -1
    for  i  in range(len(s)):
        # check if character is already in the dictionary
        if s[i] in dictn and dictn[s[i]] > start:
            start = dictn[s[i]]
        # setting all characters in dictionary
        dictn[s[i]] = i
        # i- start gives length of the current substring
        # check whether current substring's length is greater than previous
        if i-start > longest_len:
                # we do start+1 as start is -1 and
                # when we set it to dictn[s[i]], 
                # it includes the repeating char
                longest_str = s[start+1: i+1]
        # set longest length equal to previous or current substring's length
        # whichever is greater
        longest_len = max(longest_len, (i - start))
    print (f'The longest substring: {longest_str}, length: {longest_len}')   


# example to test the longest_substring program
x = 'ACDBVWGVRWFRADS'
longest_Substring(x)

The following output is produced by the above program.

The longest substring: ACDBVWG, length: 7

Complexity

  • Time: since we use a dictionary in this algorithm, the runtime significantly increases. In the worst case, it will be O(n) runtime, because we'd need to traverse all characters of the string only once, and in the best case, there'd be a constant runtime, if there's only one character in the string.

  • Space: irrespective of the size of the string, the space requirement would be the same as in the dictionary, we are only storing distinct characters. The space complexity would therefore be O(1), in either best or worst case.

Method 3: HashMap with a trick

Another way to solve this problem is using the dictionary data type but this time, we use the starting index variable and the ending index variable to check whether we added an item to the dictionary or not. If we add an item, we increment the ending index variable, and if find a repeating character, we shift the starting index to the next position. This way we ignore the previous characters, each time we find a repeating character. This increases efficiency and saves space. To implement this algorithm, follow the below steps-

  1. Create a variable to keep track of the starting index of the substring (in this case, i), and likewise, an ending index variable (j). Also, create a variable to keep track of the longest substring length found so far. Initialize all these variables to 0.

  2. Now create an empty dictionary, which we'll use to traverse all characters in the given string.

  3. Create an empty string variable to keep track of the longest string found.

  4. Create a while loop to traverse all the characters of the given string till the end. Then using an if statement, check whether the character is in the dictionary or not. In case it isn't, add the character and its index as a key-value pair in the dictionary. Now, we check using another if statement, if the length of the current substring is greater than the previous substring. Initially, it would always be true. If it is, we include this character for now. After adding the character and its index in the dictionary, we increment the ending variable. This way we only consider two characters at a time. Now we set the longest length equal to the length of the previous or current substring, whichever is greater.

  5. In case we find a repeating character, we delete the character just before this position and shift the starting index variable forward, so that we only consider two characters at a time.

  6. Finally, we print the longest substring with its length.

Implementation example

The following program executes the algorithm discussed above.

# Part of iq.opengenus.org
def longest_substring(s):

    # end is j, i is start
    # increase j to imply that length increases
    # increase i to shift the starting index
    i = j = longest_len = 0
    chars = dict()

    # empty string variable 
    longest_str = ''
    while j < len(s):
        if s[j] not in chars:
            # add character if unique
            chars[s[j]] = j

            # if current substring's length is greater,
            # include the newly added character to the empty string
            if len(chars)> longest_len:
                longest_str += s[j]
            # increment ending index variable
            # so that we only consider two elements at a time
            j = j + 1

            # set longest length equal to previous or current substring's length
            # whichever is greater
            longest_len = max(longest_len, len(chars))
            
            
        else:
            # delete the character just before the repeating character
            # increment the starting index variable
            # so that we only consider, two elements at a time

            del chars[s[i]]
            i = i + 1
    # print the longest substring with its length
    print(f'The longest substring: {longest_str}, length: {longest_len}')
    

# example to test the longest substring program
x = 'ACDBVWGVRWFRADS'
longest_substring(x)

The above program produces the following output.

The longest substring: ACDBVWG, length: 7

Complexity

  • Time: since we again use a dictionary in this method, so the runtime in the worst case would be O(n) and in the best case O(1).

  • Space: the space requirement would also be constant in either best or worst case, so the space complexity would be O(1).

Applications of this problem

  • String manipulation and lookup are used by many algorithms in computer science, which in turn are widely used in applications like plagiarism checker, file scanning, image to text conversions, search engines, spell checker programs etc.
  • In many areas of biology and biotechnology, particularly DNA sequence analysis and related areas that require biological data, string algorithms are widely applicable.

With this article at OpenGenus, you must have the complete idea to find the Longest substring without repeating characters.