Prefix table/ LPS in KMP algorithm and its applications
Get FREE domain for 1st year and build your brand new site
In this article, we have presented the optimal linear time algorithm to compute the Prefix table/ LPS (Longest Prefix Suffix) which is used in KMP algorithm along with other key applications.
Table of contents:
 Introduction to Prefix table/ LPS
 Optimal Algorithm for Prefix Table
 Example for Creating KMP Algorithm's LPS (Prefix Table)
 Implementations of Prefix table/ LPS
 Time & Space Complexity
 Applications of Prefix Table/ LPS
Let us get started with Prefix table/ LPS in KMP algorithm and its applications.
Introduction to Prefix table/ LPS
Prefix table (also known as LPS/ Longest Prefix Suffix) is an array data structure which captures the longest prefix which is also a suffix for every substring starting at index 0.
This data structure was first used in KMP algorithm which is used in find a pattern in a given set of strings. The algorithm compares character by character and uses the Prefix table/ LPS to make a decision to skip few characters when a mismatch occurs. Prefix table is computed as a part of preprocessing and the key is to compute it in linear time O(N) where N is the length of the pattern for which LPS is calculated.
Definition of LPS:
LPS = "Longest Proper Prefix which is also Suffix"
LPS[i] = MAXIMUM(j) such that string[0 to j1] == string[ij+1 to i]
that is: longest substring from index 0 which also, ends at index i.
This simple pattern stored in LPS can be used to accelerate several algorithm and also, enable a linear time string pattern matching algorithm known as KMP algorithm.
The brute force approach to compute values of LPS take O(N^{2}) time. This is an overhead for most algorithms.
An optimal algorithm exists which can compute LPS in linear time O(N) and this makes the use of Prefix table LPS widespread.
Optimal Algorithm for Prefix Table

Define one dimentional array with the size equal to the length of the pattern (LPS[pattern.length])

Define variables i and j and set LPS[0]=0.

compare the characters in the pattern at the pattern[i] and pattern[j].

If both the characters matches then set LPS[j]=i+1 and increment both i and j values by 1.
GOTO STEP 3 
If both the characters do not match
 check the value of i
 if its '0' then set LPS[j]=0 and increment 'j' value by one
 if its not '0' then set i=LPS[i1].
GOTO STEP 3

Repeat Above steps untill all the characters of the pattern are traversed.
Example for Creating KMP Algorithm's LPS (Prefix Table)
Let the given pattern be
Index:  0  1  2  3  4  5  6 

String:  A  B  C  D  A  B  D 
Let us define LPS[] array with size =length of pattern which is 7
Index:  0  1  2  3  4  5  6 

LPS: 
 step1=Define variables i & j. Set i=0,j=1 and LPS[0]=0.
i=0 and j=1
Index:  0  1  2  3  4  5  6 

LPS:  0 
 step2=Compare Pattern[i] and PAttern[j] .Since both are not matching and also "i==0" ,we need to set LPS[j]=0 and increment 'j' value by one.
i=0 and j=2
Index:  0  1  2  3  4  5  6 

LPS:  0  0 
 step3= Compare Pattern[i] and PAttern[j] .Since both are not matching and also "i==0" ,we need to set LPS[j]=0 and increment 'j' value by one.
i=0 and j=3
Index:  0  1  2  3  4  5  6 

LPS:  0  0  0 
 step4=Compare Pattern[i] and PAttern[j] .Since both are not matching and also "i==0" ,we need to set LPS[j]=0 and increment 'j' value by one.
i=0 and j=4
Index:  0  1  2  3  4  5  6 

LPS:  0  0  0  0 
 step5=Compare Pattern[i] and PAttern[j] .Since both are matching set LPS[j]=i+1 and increment 'j' and 'i' value by one.
i=1 and j=5
Index:  0  1  2  3  4  5  6 

LPS:  0  0  0  0  1 
 step6=Compare Pattern[i] and PAttern[j] .Since both are matching set LPS[j]=i+1 and increment 'j' and 'i' value by one.
i=2 and j=6
Index:  0  1  2  3  4  5  6 

LPS:  0  0  0  0  1  2 
 step7=Compare Pattern[i] and PAttern[j] .Since both are not matching and i!=0 ,we need to set i= LPS[i1]===> LPS[21]=LPS[1]=0.
i=0 and j=6
Index:  0  1  2  3  4  5  6 

LPS:  0  0  0  0  1  2 
 step8=Compare Pattern[i] and PAttern[j] .Since both are not matching and also "i==0" ,we need to set LPS[j]=0 and increment 'j' value by one.
Index:  0  1  2  3  4  5  6 

LPS:  0  0  0  0  1  2  0 
Here LPS[] is filled with all values so we stop the process .The final LPS[] table
is as follows..
Index:  0  1  2  3  4  5  6 

LPS:  0  0  0  0  1  2  0 
Implementations of Prefix table/ LPS
Following is the implementation of our linear time algorithm for creating LPS array in C++:
vector<int> prefixFunction(string s) {
vector<int> p(s.size());
int j = 0;
for (int i = 1; i < (int)s.size(); i++) {
while (j > 0 && s[j] != s[i])
j = p[j1];
if (s[j] == s[i])
j++;
p[i] = j;
}
return p;
}
Following is the implementation of our linear time algorithm for creating LPS array in Python:
p='ababaca'
l1 = len(p)
j = 0
i = 1
prefix = [0]
while len(prefix) < l1:
if p[j] == p[i]:
prefix.append(j+1)
i += 1
j += 1
else:
if j == 0:
prefix.append(0)
i += 1
if j != 0:
j = prefix[j1]
print prefix
Following is the implementation of our linear time algorithm for creating LPS array in Java:
String pattern = "ababaca";
int i = 1, j = 0;
int[] prefixArray = new int[pattern.length];
while (i < pattern.length) {
while (pattern.charAt(i) != pattern.charAt(j) && j > 0) {
j = prefixArray[j  1];
}
if (pattern.charAt(i) == pattern.charAt(j)) {
prefixArray[i] = j + 1;
i++;
j++;
} else {
prefixArray[i] = j;
i++;
}
}
for (int k = 0; k < prefixArray.length; ++k) {
System.out.println(prefixArray[k]);
}
Time & Space Complexity
 Best case time complexity:
Î˜(n)
 Space complexity:
Î˜(n)
where n is the length of the string
Applications of Prefix Table/ LPS
Applications of Prefix Table/ LPS are:
 Search for a substring in a string. The KnuthMorrisPratt algorithm
 Counting the number of occurrences of each prefix
 The number of different substring in a string
 Compressing a string
 Building an automaton according to the prefix function
Question
What is the worst case time complexity of KMP algorithm for pattern searching (m = length of text, n = length of pattern)?
With this article at OpenGenus, you must have a strong idea of how to calculate LPS array which is used in KMP algorithm efficiently.