2.5 hour
This experiment requires you to have basic knowledge about :
And above all, a curiosity to learn and explore!
In this experiment, you will be able to do the following:
Module | Weightage | Expectation |
---|---|---|
Pre-Test | 10% | Solve All Questions |
Naive String Searching | 25% | Understand basic algorithm of Naive String Searching |
Preprocessing for LPS | 25% | Understand the Lowest-Prefix-Suffix array creation process |
KMP Algorithm | 25% | Understand the Knuth-Morris-Pratt Algorithm |
Post-test | 15% | Solve all Questions |
10 minutes
Q1. Which of these best describes a substring?
(A) A contiguous sequence of characters within a string in any order
(B) contiguous sequence of characters within a string in the same order
(C) Any combination of a subset of characters from a string in any order.
(D) Any combination of a subset of characters from a string in the same order as the string.
A substring is a contiguous sequence of characters within a string. For instance, "the best of" is a substring of "It was the best of times". This is not to be confused with subsequence, which is a generalization of substring. For example, "Itwastimes" is a subsequence of "It was the best of times", but not a substring.
Q2. Which one of the following is not a prefix of the word "TIKI-TAKA-TIK-TI"?
(A) TAKA
(B) TIKI
(C) TI
(D) TIK
TAKA is not a prefix of the word.
Q3. What does it mean when we say that an algorithm X is asymptotically more efficient than Y?
(A) X will always be a better choice for small inputs
(B) X will always be a better choice for large inputs
(C) Y will always be a better choice for small inputs
(D) X will always be a better choice for all inputs
Performance could be similar for small inputs, but definitely better for larger inputs.
Q4. int a = 0, b = 0;
for (i = 0; i < N; i++) {
a = a + rand();
}
for (j = 0; j < M; j++) {
b = b + rand();
}
What is the time and space complexity for the above code?
(A) O(N * M) time, O(1) space
(B) O(N + M) time, O(N + M) space
(C) O(N + M) time, O(1) space
(D) O(N * M) time, O(N + M) space
The first loop runs for O(n) time, second loop runs for O(m). Constant extra space is used for 2 variables a and b, hence O(1) space.
30 minutes
In this module, we will :
Efficient string searching is crucial to the way the internet functions today. They have a plethora of uses and applications:
If we were to approach this problem of searching for a pattern inside a string, the first trivial solution would be a brute-force method. We will be examining this method in the next task of this learning unit.
The brute-force method to find a pattern in a given string text is:
Image shows Iteration by Iteration Visualization of Naive basiyan
Note: It is recommended to
see the interactive demo task
before reading this.
Deciding the big O notation
for any algorithm is a
function of the worst-case
scenario of the
algorithm. Let us consider a
case - the pattern of length
M to be searched in a string
of length N always fails on
the pattern's last character
and succeeds only at the end
of the string.
Example :
String : AAAAAAAAAAAAAAAAAAAAAAX
Pattern : AAAAAX
Assuming that the time
required to compare two
characters is O(1) then the
above case will take Time =
O(M \times N)
Hence the naive string
searching algorithm is of the
complexity \(O(M \times N)\)
Interactive artefact where user can click on Next Step to understand how string searching works.
Q1. What do we do when matching of a character fails while comparing the pattern to the string?
(A) Shift the pattern by one unit
(B) Set the value of strInd to strInd - patInd + 1
(C) Set the value of patInd to 0
(D) All of the above
The pattern is shifted to the right by one by setting the patInd to 0 after setting strInd to strInd - patInd + 1
Q2. What is theoretrically the best time complexity for any string searching algorithm for searching a pattern of length \(M\) in a text of length \(N\)?
(A) \( O(N+M) \)
(B) \( O(M \times N) \)
(C) \( O(N^M) \)
(D) None of these
It is necessary to read all the characters of the text atleast once inorder to deterministically say that the pattern doesn't exist in the text.
Q3. What is the time complexity of Naive-String-Searching Algorithm?
(A) \( O(N+M) \)
(B) \( O(N \times M) \)
(C) \( O(N^M) \)
(D) None of these
Refer to "Concept and Strategy : Naive" for the explanation.
40 minutes
In this module, we will:
image for showing the overlap goes here
The name LPS indicates the
longest proper prefix which
is also a suffix of the
matched pattern.
Example:
AA in
AABAA is a prefix as well as
a suffix.
In case AABAAC is the pattern
which we are searching in
some unknown string. And we
happen to know that after
successfully matching AABAA,
C is a mismatch.
Then we can start matching
the 3rd letter of pattern(B)
to the letter at strInd
instead of starting from the
beginning.
Definition :
Let us define the LPS array
for a pattern such that
lps[i] stores the index of
the pattern from where we
must begin comparing if the
match fails after rightly
matching the pattern until
that point.
Unlike the bruteforce approach where we just shift the pattern by and start comparing again, we must use a value from the LPS Array to decide which character to compare next. The core idea is to avoid comparing the characters which we know will match correctly anyway.
Image for kmp algo made by friendly neighbourhood spiderman
strInd = 0 patInd = 0 while(strInd < string.length()) { if(pat[patInd] == string[strInd]){ if(patInd == pat.length()-1){ print("Match Found at" + strInd-pat.length() ) return } strInd++; patInd++; } else { if(patInd != 0){ patInd = LPS[patInd-1] } else { strInd++; } } }
Interactive artefact where user can click on Next Step to understand how the kmp string searching algorithm works.
Interactive artefact where user can click on Next Step to understand how the kmp string searching algorithm works.
Interactive artefact where user can click on Next Step to understand how the kmp string searching algorithm works.
Q1. Which one of the following is true in KMP String searching algorithm?
(A) The value of strInd always increases in all iterations.
(B) The value of strInd remains same in all the iterations.
(C) The value of strInd may or may not increase in all the iterations but it never decreases.
(D) None of the above
Refer "Concept and Strategy : KMP " for explanation.
Q2. How does KMP improve the brute-force-method?
(A) By comparing the characters right to left instead of left to right.
(B) By searching for text in pattern instead of pattern in string
(C) By not doing reduntant comparisons and using the information gained from previous comparisons
(D) None of the above
Refer to "Concept and Strategy : KMP"
Q3. What is the time complexity of KMP Algorithm?
(A) \( O(M \times N) \)
(B) \( O(M ^ N) \)
(C) \( O(M + N) \)
(D) All of the above
The complexity of this algorithm will be discussed in detail in the next learning unit.
35 minutes
In this module, we will be learning about :
Image here
function LPS_ArrayGenerator(string Pat){ LPS = Array[Pat.length] j = 0 LPS[0] = 0 for(int i=1 ; i < Pat.length() ; i++){ while( j > 0 && Pat[i] != Pat[j] ) { j = LPS[j-1] } if(Pat[i] == Pat[j]){ j++ } LPS[i] = j } return LPS }
Interactive artefact where user can click on Next Step to understand how the kmp string searching algorithm works.
Q1. What is the time complexity of the prepropressing function for a pattern of length \(M\) ?
(A) \(O(M^2)\)
(B) \(O(M^3)\)
(C) \(O(\sqrt{M})\)
(D) \(O(M)\)
Refer to "Concept and Strategy : Preprocessing"
Q2. What is the longest-prefix-suffix for: "ANIDAPOPOANIDA"?
(A) 5
(B) 6
(C) 2
(D) 3
ANIDA is the longest prefix which is also a suffix.
Q3. What is the longest-prefix-suffix for TOOTOOTOOT?
(A) 1
(B) 4
(C) 5
(D) 2
TOOT is the longest prefix suffix.
10 minutes
Post Test includes questions on entire experiment concepts. Read the questions in the quiz section and select the correct option from the ones provided. Please note that some questions may have more than one correct response.
Q1. What is the overall complexity of the KMP-String-Searching algorithm?
(A) \(O(N \times M)\)
(B) \(O(N^2)\)
(C) \(O(N+M)\)
(D) None of the above
Refer to the Analysis task in preprocessing learning Unit.
Q2. What is the space complexity for the KMP String Searching algorithm?
(A) \(O(M)\)
(B)\(O(N)\)
(C) \(O(N+M)\)
(D) None of the above
\(O(M)\) to store the LPS Array.
Q3. What is the LPS array for the pattern: "ababbabbabbababbabb" ?
(A) 001201201001232101
(B) 0012012012012345678
(C) 012344567891230120
(D) 012344567891230123
Enter the above pattern in the preprocessing demonstration pane to get a thorough understanding.
Q4. Find the LPS array for "AABAAABAA"?
(A) 010122345
(B) 010000123
(C) 012345678
(D) 010010010
Enter the above pattern in the preprocessing demonstration pane to get a thorough understanding.
Q5. Using the LPS array from the previous question, which character would you start comparing from if you know that the matching failed at AABAAABAA?
(A) 0
(B) 1
(C) 6
(D) 3
A look at the LPS array(010122345) would reveal to us that AABAAAB has longest-suffix-prefix of length 3.
You can explore more about the KMP string searching algorithm through the following resources :
Useful Links : Reference Texts :