KMP Algorithm

Building Intuition

  • Let us think of a method by which we can make our bruteforce method more efficient. We notice that we have been doing redundant matching when the matching of a pattern in a string fails since we are moving the string Index back. Can we possibly track the information we get from previous comparisons?
  • We have to exploit the degenerating property of the pattern we are searching. i.e pattern having same sub-patterns appearing more than once in the pattern.

Finding Repeating Substrings

The Longest-Prefix-Suffix Array

The name LPS indicates the longest proper prefix which is also a suffix of the matched pattern.

Example:

AA in

AABAA is a prefix as well as a suffix.

In case AABAAC is the pattern which we are searching in some unknown string. And we happen to know that after successfully matching AABAA, C is a mismatch.

Then we can start matching the 3rd letter of pattern(B) to the letter at strInd instead of starting from the beginning.

Definition : Let us define the LPS array for a pattern such that lps[i] stores the index of the pattern from where we must begin comparing if the match fails after rightly matching the pattern until that point.