Strings - Find The Index of The First Occurrence in a String

Given two strings needle and haystack, return the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack.

Input: haystack = "sadbutsad", needle = "sad"
Output: 0
Explanation: "sad" occurs at index 0 and 6. The first occurrence is at index 0, so we return 0.

Input: haystack = "cscodeio", needle = "cscd"
Output: -1
Explanation: "cscd" did not occur in "cscodeio", so we return -1.

Constraints:

1 <= haystack.length, needle.length <= 10⁴
haystack and needle consist of only lowercase English characters.

Solution 1 - Using two pointers, match characters from both strings
Solution 2 - Using KMP (Knuth-Morris-Pratt) Pattern Matching Algorithm

In this approach, we will use two pointers i to point to input string haystack and j to point at needle, and find the starting matching character from needle in haystack, then keep checking if rest of the characters are matching, if they match, return the index where the match found, otherwise keep moving i pointer until the next match found and repeat this until we went through all characters of haystack string.

implementation steps:

Initialize a variable i pointer that we will use to track through string haystack.
Then for each character at i^th position, loop through characters from string needle, and check characters in both strings matches, if all characters are matching then return the index i, and if any of the character is not matching, then increment i to move to next character in haystack and continue.
If no match found, return -1.

public class IndexOfFirstOccurrenceInAString { static int indexOf(String haystack, String needle) { int i =0; while(i < haystack.length() && i+needle.length() <= haystack.length()) { int j=0; for(; j<needle.length(); j++) { if(needle.charAt(j) != haystack.charAt(i+j)) { // not same characters break; } } if(j == needle.length()) { return i; } i++; } return -1; } }

Complexity Analysis:

Time complexity: Above code runs in O(n * m) time where n is the length of input string haystack and m is the length of input string needle.
Space complexity: O(1)

In this approach, we will apply KMP (Knuth-Morris-Pratt) algorithm to match a pattern pattern in a string text.

The main idea in this alogorithm is, in the naive approach, say we started matching characters from index i and when a character mismatch found at an index j between the pattern and text, we will start again from i+1 from where it started, and this takens O (n * m) time complexity. But, KMP algorithm helps not to start from beginning again and instead start from where few starting characters can be ignored, thus by improving the runtime complexity to O(n).

For example, say text = "aaaab" and pattern = "aaab", when we start matching characters, we will find a mismatch at 4^th index between text = "aaaab" and pattern = "aaab", then we have to start again from 2^nd character in text = "aaaab", this is where KMP algorithm helps.

Implementation steps:

There are two main steps in the implementation, one is building a LPS (longest prefix suffix) array from the pattern string, then when we start matching characters and if a mismatch is found, instead going back to where it started, we can reset the search back to an index from LPS array.
Steps for building LPS array:
- Initialize two variables, i=0 to track prefixes and j=1 to track suffixes of string pattern. we are initializing j=1 is because at index 0, prefix and suffix are same and there is no need to compute LPS at index 0.
- Then loop through each character of pattern string and check if character at index i and index j are same, if they are then update LPS array with lps[j] = i +1, meaning that the previous prefix at current suffix index is lps[i-1] and advance both i and j.
  If the characters are not same, then reset i to lps[i-1] to previous longest prefix position if i has advanced, otherwise update current lps array position with 0 and advance j.
Steps for KMP algorithm:
- Initialize two variables i to iterate through text and j to iterate through pattern.
- Then, loop through characters of both text and pattern and if they are same, advance both i and j.
- If the characters are not same, then if j has advanced, reset j to a previously known prefix which is also a suffix. j = lps[j-1], otherwise advance i, this is an important step.
  
  For example consider text= "abcabcyabcxabcyabczadbca" and pattern = "abcyabcz"
  1. First, we will compute the LPS array for pattern string, it will be 0, 0, 0, 0, 1, 2, 3, 0.
  2. Then loop through characters of text and pattern, first mismatch found at index i = 3, as there character a in text and character y in pattern, so we we reset j^th index, whre it can start from, since there is no prefix found for the current suffix index, j = lps[j-1] which will be 0 and i remains at the current index where it is.
  3. Next, continue matching characters between text and pattern, and the next mismatch found at index i = 10, as text has character x and pattern has z, and index j is at 7, now we will reset j = lps[j-1] , which will become 3.
    What it means is that, current suffix highlighted in orangle color abcyabcz is also a prefix highlighted in blue color, so what it tells is that, current suffix ended at index 6 abc is already seen before prefix ending at index 3, we can start matching from 3^rd index onwards, instead of going all the way back. As you can see in the text string matching will start from character highlighted with blue color abcabcyabcxabcyabczadbca and abc highlighted in orange color is already matched with prefix in pattern string, this way KMP algorithm helps to reset the pattern's index such that characters already matched will not be needed to match again.
- After each character comparision, check if index j reached end of pattern string, that means we have found matching pattern, so return i -j to get the strating index of where the pattern started in text string.
- If no match found, then return -1.

public class IndexOfFirstOccurrenceInAString { /** * Pattern matching using KMP algorithm * 1. build a LPS array (longest prefix suffix array) * 2. match characters between haystack and needle * if any mismatch found, then reset needle index to a prefix index from LPS array * */ static int patternMatchingUsingKMP(String haystack, String needle) { int[] lps = createLPSArray(needle); int i=0; // track haystack int j=0; // track needle while(i<haystack.length() && j<needle.length()) { if(haystack.charAt(i) == needle.charAt(j)) { i++; j++; } else { if(j != 0) { j = lps[j-1]; } else { i++; } } if(j == needle.length()) { return i-j; } } return -1; } static int[] createLPSArray(String pattern) { int[] lps = new int[pattern.length()]; int i=0; int j=1; // start from index 1, because at index 0, there is no character (previous) to compare with while(j < pattern.length()) { if(pattern.charAt(i) == pattern.charAt(j)) { lps[j] = i +1; i++; j++; } else { if (i != 0) { i = lps[i-1]; } else { lps[j] = 0; j++; } } } return lps; } }

Complexity Analysis:

Time complexity: Above code runs in O(n) time where n is the length of input string haystack, though we are building LPS array iterating through needle string, length of this will not be greater than n.
Space complexity: O(m) where m is the length of input string needle.

Above implementations source code can be found at GitHub link for Java code

Constraints:

Contents

implementation steps:

Complexity Analysis:

Implementation steps:

Complexity Analysis: