Prefix table/ LPS in KMP algorithm and its applications

In this article, we have presented the optimal linear time algorithm to compute the Prefix table/ LPS (Longest Prefix Suffix) which is used in KMP algorithm along with other key applications.

Table of contents:

  1. Introduction to Prefix table/ LPS
  2. Optimal Algorithm for Prefix Table
  3. Example for Creating KMP Algorithm's LPS (Prefix Table)
  4. Implementations of Prefix table/ LPS
  5. Time & Space Complexity
  6. Applications of Prefix Table/ LPS

Let us get started with Prefix table/ LPS in KMP algorithm and its applications.

Introduction to Prefix table/ LPS

Prefix table (also known as LPS/ Longest Prefix Suffix) is an array data structure which captures the longest prefix which is also a suffix for every substring starting at index 0.

This data structure was first used in KMP algorithm which is used in find a pattern in a given set of strings. The algorithm compares character by character and uses the Prefix table/ LPS to make a decision to skip few characters when a mismatch occurs. Prefix table is computed as a part of pre-processing and the key is to compute it in linear time O(N) where N is the length of the pattern for which LPS is calculated.

Definition of LPS:

LPS = "Longest Proper Prefix which is also Suffix"

LPS[i] = MAXIMUM(j) such that string[0 to j-1] == string[i-j+1 to i]

that is: longest sub-string from index 0 which also, ends at index i.

This simple pattern stored in LPS can be used to accelerate several algorithm and also, enable a linear time string pattern matching algorithm known as KMP algorithm.

The brute force approach to compute values of LPS take O(N2) time. This is an overhead for most algorithms.

An optimal algorithm exists which can compute LPS in linear time O(N) and this makes the use of Prefix table LPS widespread.

Optimal Algorithm for Prefix Table

  • Define one dimentional array with the size equal to the length of the pattern (LPS[pattern.length])

  • Define variables i and j and set LPS[0]=0.

  • compare the characters in the pattern at the pattern[i] and pattern[j].

  • If both the characters matches then set LPS[j]=i+1 and increment both i and j values by 1.
    GOTO STEP 3

  • If both the characters do not match

    • check the value of i
    • if its '0' then set LPS[j]=0 and increment 'j' value by one
    • if its not '0' then set i=LPS[i-1].
      GOTO STEP 3
  • Repeat Above steps untill all the characters of the pattern are traversed.

Example for Creating KMP Algorithm's LPS (Prefix Table)

Let the given pattern be

Index: 0 1 2 3 4 5 6
String: A B C D A B D

Let us define LPS[] array with size =length of pattern which is 7

Index: 0 1 2 3 4 5 6
LPS:
  1. step1=Define variables i & j. Set i=0,j=1 and LPS[0]=0.
    i=0 and j=1
Index: 0 1 2 3 4 5 6
LPS: 0
  1. step2=Compare Pattern[i] and PAttern[j] .Since both are not matching and also "i==0" ,we need to set LPS[j]=0 and increment 'j' value by one.
    i=0 and j=2
Index: 0 1 2 3 4 5 6
LPS: 0 0
  1. step3= Compare Pattern[i] and PAttern[j] .Since both are not matching and also "i==0" ,we need to set LPS[j]=0 and increment 'j' value by one.
    i=0 and j=3
Index: 0 1 2 3 4 5 6
LPS: 0 0 0
  1. step4=Compare Pattern[i] and PAttern[j] .Since both are not matching and also "i==0" ,we need to set LPS[j]=0 and increment 'j' value by one.
    i=0 and j=4
Index: 0 1 2 3 4 5 6
LPS: 0 0 0 0
  1. step5=Compare Pattern[i] and PAttern[j] .Since both are matching set LPS[j]=i+1 and increment 'j' and 'i' value by one.
    i=1 and j=5
Index: 0 1 2 3 4 5 6
LPS: 0 0 0 0 1
  1. step6=Compare Pattern[i] and PAttern[j] .Since both are matching set LPS[j]=i+1 and increment 'j' and 'i' value by one.
    i=2 and j=6
Index: 0 1 2 3 4 5 6
LPS: 0 0 0 0 1 2
  1. step7=Compare Pattern[i] and PAttern[j] .Since both are not matching and i!=0 ,we need to set i= LPS[i-1]===> LPS[2-1]=LPS[1]=0.
    i=0 and j=6
Index: 0 1 2 3 4 5 6
LPS: 0 0 0 0 1 2
  1. step8=Compare Pattern[i] and PAttern[j] .Since both are not matching and also "i==0" ,we need to set LPS[j]=0 and increment 'j' value by one.
Index: 0 1 2 3 4 5 6
LPS: 0 0 0 0 1 2 0

Here LPS[] is filled with all values so we stop the process .The final LPS[] table
is as follows..

Index: 0 1 2 3 4 5 6
LPS: 0 0 0 0 1 2 0

Implementations of Prefix table/ LPS

Following is the implementation of our linear time algorithm for creating LPS array in C++:

vector<int> prefixFunction(string s) {
vector<int> p(s.size());
int j = 0;
for (int i = 1; i < (int)s.size(); i++) {
    while (j > 0 && s[j] != s[i])
        j = p[j-1];

    if (s[j] == s[i])
        j++;
    p[i] = j;
}   
return p;
}

Following is the implementation of our linear time algorithm for creating LPS array in Python:

p='ababaca'
l1 = len(p)
j = 0
i = 1
prefix = [0]
while len(prefix) < l1:
if p[j] == p[i]:
    prefix.append(j+1)
    i += 1
    j += 1
else:
    if j == 0:
        prefix.append(0)
        i += 1
    if j != 0:
        j = prefix[j-1]
print prefix

Following is the implementation of our linear time algorithm for creating LPS array in Java:

String pattern = "ababaca";
int i = 1, j = 0;
int[] prefixArray = new int[pattern.length];
while (i < pattern.length) {
while (pattern.charAt(i) != pattern.charAt(j) && j > 0) {
        j = prefixArray[j - 1];
}
if (pattern.charAt(i) == pattern.charAt(j)) {
        prefixArray[i] = j + 1;
        i++;
        j++;

    } else {
        prefixArray[i] = j;
        i++;
    }
}
for (int k = 0; k < prefixArray.length; ++k) {
    System.out.println(prefixArray[k]);
}

Time & Space Complexity

  • Best case time complexity: Θ(n)
  • Space complexity: Θ(n)
    where n is the length of the string

Applications of Prefix Table/ LPS

Applications of Prefix Table/ LPS are:

  • Search for a substring in a string. The Knuth-Morris-Pratt algorithm
  • Counting the number of occurrences of each prefix
  • The number of different substring in a string
  • Compressing a string
  • Building an automaton according to the prefix function

Question

What is the worst case time complexity of KMP algorithm for pattern searching (m = length of text, n = length of pattern)?

O(m)
O(n*m)
O(n)
O(logn)
O(m) is the worst case as well.

With this article at OpenGenus, you must have a strong idea of how to calculate LPS array which is used in KMP algorithm efficiently.