Given two strings
Output: 0
Explanation: "sad" occurs at index 0 and 6. The first occurrence is at index 0, so we return 0.
Output: -1
Explanation: "cscd" did not occur in "cscodeio", so we return -1.
Constraints:
1 <= haystack.length, needle.length <= 104 haystack andneedle consist of only lowercase English characters.
Contents
Solution 1 - Using two pointers, match characters from both strings Solution 2 - Using KMP (Knuth-Morris-Pratt) Pattern Matching Algorithm
In this approach, we will use two pointers
implementation steps:
-
Initialize a variable
i pointer that we will use to track through stringhaystack . -
Then for each character at
ith position, loop through characters from stringneedle , and check characters in both strings matches, if all characters are matching then return the indexi , and if any of the character is not matching, then incrementi to move to next character inhaystack and continue. -
If no match found, return
-1 .
Complexity Analysis:
Time complexity: Above code runs in O(n * m) time where
Space complexity: O(1)
In this approach, we will apply
The main idea in this alogorithm is, in the naive approach, say we started matching characters from index
For example, say
Implementation steps:
-
There are two main steps in the implementation, one is building a LPS (longest prefix suffix) array from the
pattern string, then when we start matching characters and if a mismatch is found, instead going back to where it started, we can reset the search back to an index from LPS array. -
Steps for building LPS array:
-
Initialize two variables,
i=0 to track prefixes andj=1 to track suffixes of stringpattern .we are initializing j=1 is because at index 0, prefix and suffix are same and there is no need to compute LPS at index 0. -
Then loop through each character of
pattern string and check if character at indexi and indexj are same, if they are then update LPS array withlps[j] = i +1 , meaning that the previous prefix at current suffix index islps[i-1] and advance bothi andj .
If the characters are not same, then reseti tolps[i-1] to previous longest prefix position ifi has advanced, otherwise update currentlps array position with 0 and advancej .
static int[] createLPSArray(String pattern) { int[] lps = new int[pattern.length()]; int i=0; int j=1; // start from index 1, because at index 0, there is no character (previous) to compare with while(j < pattern.length()) { if(pattern.charAt(i) == pattern.charAt(j)) { lps[j] = i +1; i++; j++; } else { if (i != 0) { i = lps[i-1]; } else { lps[j] = 0; j++; } } } return lps; } -
Initialize two variables,
-
Steps for KMP algorithm:
-
Initialize two variables
i to iterate throughtext andj to iterate throughpattern . -
Then, loop through characters of both
text andpattern and if they are same, advance bothi andj . -
If the characters are not same, then if
j has advanced, resetj to a previously known prefix which is also a suffix.j = lps[j-1] , otherwise advancei , this is an important step.
For example consider
text= "abcabcyabcxabcyabczadbca" andpattern = "abcyabcz" -
First, we will compute the LPS array for
pattern string, it will be0, 0, 0, 0, 1, 2, 3, 0 . -
Then loop through characters of
text andpattern , first mismatch found at indexi = 3 , as there charactera intext and charactery in pattern, so we we resetjth index, whre it can start from, since there is no prefix found for the current suffix index,j = lps[j-1] which will be 0 and i remains at the current index where it is. -
Next, continue matching characters between
text andpattern , and the next mismatch found at indexi = 10 , astext has characterx andpattern hasz , and indexj is at7 , now we will resetj = lps[j-1] , which will become 3.What it means is that, current suffix highlighted in orangle color
abcyabcz is also a prefix highlighted in blue color, so what it tells is that, current suffix ended at index 6abc is already seen before prefix ending at index 3, we can start matching from 3rd index onwards, instead of going all the way back. As you can see in thetext string matching will start from character highlighted with blue colorabcabcyabcxabcyabczadbca andabc highlighted in orange color is already matched with prefix inpattern string, this way KMP algorithm helps to reset thepattern 's index such that characters already matched will not be needed to match again.
-
First, we will compute the LPS array for
-
After each character comparision, check if index
j reached end ofpattern string, that means we have found matching pattern, so returni -j to get the strating index of where thepattern started intext string. - If no match found, then return -1.
-
Initialize two variables
Complexity Analysis:
Time complexity: Above code runs in O(n) time where
Space complexity: O(m) where
Above implementations source code can be found at