Longest Common Suffix Problem

Internship at OpenGenus

Get FREE domain for 1st year and build your brand new site

In this article, we will see how we can find the longest common suffix (i.e ending) that all the strings given to us have. We shall start with the brute-force approach for two strings. Following this, we will implement the Trie data structure to solve the problem for more than two strings.

Table of content:

  1. Understanding the problem: Longest Common Suffix
  2. Brute force approach for two given strings
  3. Scaling of the brute force approach
  4. Using Trie data structure for an optimized approach

Let us understand and solve this problem.

Understanding the problem: Longest Common Suffix

Suppose we are given two strings str1 and str2 and we have to find the longest common suffix they share. For example,

String 1: "monday"
String 2: "tuesday"
Common suffix: "day"

String 1: "yo abc."
String 2: "hey i'm abc."
Common suffix: " abc."

Brute force approach for two given strings

So, the simplest approach shall be to iterate both the strings from their last character and linearly compare characters one by one. We use two different iterators for the two strings, each one initialized with a different value. Let m and n be the iterators for str1 and str2.

We also initialize a string ans, which will be appended whenever we find a character that is common in the suffix. The iteration is broken off when the characters mismatch or one or both strings have been fully traversed. This code will look like this:

string commonsuffix(string s1,string s2)
{
    int m=s1.size()-1; int n=s2.size()-1; string ans="";

    while(m>-1 && n>-1)
    {
        if(s1[m]==s2[n]){
            ans+=s1[m];
            m--; n--;
        }
        else
        {
            break;
        }
    }
    reverse(ans.begin(),ans.end());
    return ans;
}

Here, we see that the common suffix string ans is reversed before being returned. This is because as we traverse the strings backwards, the value in ans will be inverted. Example ,if str1= "abcabcd" and str2= "defabcd"

In the first iteration,
m points to the last character in str1 i.e. 'd', &
n points to the last character in str2 i.e. 'd'.
Since, str1[m]==str[n], therefore, ans="d".

In the second iteration,
m points to the last second character in str1 i.e. 'c', &
n points to the last second character in str2 i.e. 'c'.
Since, str1[m]==str[n], therefore, ans+="c", i.e. ans="dc",
which is the reverse of the common suffix which is "cd".

As the iterations continue, ans will be equal to "dcba" and after reversing,
"abcd" will be returned as the final answer.

Time complexity of this implementation is: O(n)

where n is the length of the shorter string.

Scaling of the brute force approach

If we want to scale this problem for more than two strings using this brute-force linear iteration approach, we will have to take the last letter from one of the strings, and compare it to the last letter of every other string. If all the strings have the same last letter, the s[s.size()-2] is checked for all strings. This continues on until a case where the letter varies in even a single case or one of the strings has reached its end. All common letters are appended to the answer string ans , which is later reversed and returned.

To keep track of the last indexes of all the strings given to us (in a vector<string> given), we create another vector. This vector can either contain the last index of the strings in the form of vector<int> or, we can maintain a vector of pointers of the type const_reverse_iterator where each value is initialized by given[i].crbegin() of the string. Then, on comparison, if the character is found to be common, the iterator is incremented using ++ operator.

Time complexity for this is O(n * m) where n is the number of strings in the vector given and m is the length of the shortest string.

Using Trie data structure for an optimized approach

Before we dive into this approach, we need a necessary understanding of what the Trie Data Structure is, as well as its working and implementation.

We will be implementing Trie using an unordered_map and a bool. The structure is as follows:

struct Trie
{
    // true when the node is a leaf node
    bool endofword;

    // each node stores a map to its child nodes
    unordered_map<char, Trie*> map;
};

Let's take up a simple question

If a blank space " " is encountered in a string, how will the trie process it?

The blank space will be given its own node and added as a key to the map.
The trie will skip the blank space and move on to the next character.
Undefined behaviour.
Segmentation fault.
Every blank space will be given its own node.

We will also need to define the insert() function and a getnewnode() function for our trie, to insert the given strings into it.

Code:

// Function that returns a new Trie node
Trie* getnewnode()
{
    Trie* node = new Trie;
    node->endofword = false;
   
    return node;
}

// Iterative function to insert a string into a Trie
void insert(Trie*& head, string str)
{
    if (head == nullptr) {
        head = getnewnode();
    }
    int i=0;
    // start from the root node
    Trie* curr = head;
    while (i<str.size())
    {
        // create a new node if the path doesn't exist
        if (curr->map.find(str[i]) == curr->map.end()) {
            curr->map[str[i]] = getnewnode();
        }

        // go to the next node
        curr = curr->map[str[i]];
        
        // move to the next character
        i++;
    }

    // mark the current node as a leaf
    curr->endofword = true;
}

Now that these two functions have been created, we will use them to initialize our trie with the strings given, in the main function. Since, we are aiming to find the common suffix, each string will be reversed before insertion. Code snippet:

int main()
{
    Trie* head = nullptr;
    std::vector<string> given={"monday","tuesday","wednesday","saturday","friday"};
    for(int i=0;i<a.size();i++)
    {
        reverse(a[i].begin(),a[i].end());
        insert(head,a[i]);
    }
/*
We will add more code here for calling the common suffix function.
*/
}

Since our trie is now ready, we can now solve the problem we're here for i.e. finding the common suffix. For this, we define the findcommonsuffix() function. We will take the head node of our trie, in which all the strings have been inserted after reversal. If the character is part of the common suffix,then the map will contain only that one single character, i.e. map.size()==1. So, we traverse the trie while

  1. The size of the map in the trie node is 1 and,
  2. none of the strings have reached their end, i.e. endofword==false.
    The letters which fulfil the conditions are added to our answer string ans.
    The moment either of the conditions turn false, we shall stop the traversal, and return the string ans after reversing it.

So, we have the following code:

string findcommonsuffix(Trie* head, string s)
{
    string ans="";
    if(head==nullptr){
        return ans;
    }
    Trie* curr= head; int i=0;
    while(curr->map.size()==1 && curr->endofword==false)
   {
       ans+=s[i];
       curr=curr->map[s[i]];
       i++;
   }
    reverse(ans.begin(),ans.end());
    return ans;
}

We have our function ready. So, we edit the main function and call findcommonsuffix(). Let's take a final look at our entire code.

#include <iostream>
#include <unordered_map>
#include<bits/stdc++.h>
using namespace std;

// Data structure to store a Trie node
struct Trie
{
    // true when the node is a leaf node
    bool endofword;

    // each node stores a map to its child nodes
    unordered_map<char, Trie*> map;
};

// Function that returns a new Trie node
Trie* getnewnode()
{
    Trie* node = new Trie;
    node->endofword = false;

    return node;
}

// Iterative function to insert a string into a Trie
void insert(Trie*& head, string str)
{
    if (head == nullptr) {
        head = getnewnode();
    }
int i=0;
    // start from the root node
    Trie* curr = head;
    while (i<str.size())
    {
        // create a new node if the path doesn't exist
        if (curr->map.find(str[i]) == curr->map.end()) {
            curr->map[str[i]] = getnewnode();
        }

        // go to the next node
        curr = curr->map[str[i]];

        // move to the next character
        i++;
    }

    // mark the current node as a leaf
    curr->endofword = true;
}

string findcommonsuffix(Trie* head, string s)
{
    string ans="";
    if(head==nullptr){
        return ans;
    }
    Trie* curr= head; int i=0;
   while(curr->map.size()==1 && curr->endofword==false)
   {
       ans+=s[i];
       curr=curr->map[s[i]];
       i++;
   }
 reverse(ans.begin(),ans.end());
    return ans;
}

int main()
{
    Trie* head = nullptr;
    std::vector<string> given={"Sunday", "Monday", "Tuesday", "Wednesday", "Friday", "Saturday"};
    for(int i=0;i<given.size();i++)
    {
        reverse(given[i].begin(),given[i].end());
        insert(head,given[i]);
    }
    cout<<endl<<"'"<<findcommonsuffix(head,given[0])<<"'"<<endl;
    return 0;
}

Using a trie for this problem reduces the time complexity to O(m) where m is the length of the shortest string, a reduction from the brute force approach, which was O(n * m).

However, the insertion of the strings in the trie takes O(n * m) where n is the number of words/strings and m is the average length of each string.

With this article at OpenGenus, you learnt the different ways you can approach and solve the longest common suffix problem. Keep learning!