KMP Algorithm Efficient String Search | Generated by AI

Home 2025.10

Understanding the Knuth-Morris-Pratt (KMP) Algorithm

You’re spot on with your intuition: KMP is all about not wasting partial matches during string searching. When you’re looking for a pattern (like “abc”) in a longer text (like “ababcabc”), the naive approach restarts from scratch every time there’s a mismatch, which is inefficient. KMP smartly “remembers” how much of the pattern you’ve already matched and jumps ahead, avoiding redundant comparisons. This makes it super fast—linear time, O(n + m), where n is text length and m is pattern length.

I’ll break it down step-by-step with a simple example. We’ll search for the pattern P = "abab" in the text T = "ababababc". (It appears at positions 0 and 2.)

Step 1: The Problem and Naive Approach

This can be O(n*m) in the worst case (e.g., searching “aaaaa…a” for “aaa…ab”).

Step 2: KMP’s Key Idea – The Prefix Table (or “Failure Function”)

KMP precomputes a table π (pi) for the pattern P. This table tells you, for each position i in P, the longest proper prefix of P[0..i] that is also a suffix. In other words: “If we mismatch here, how much of the partial match can we reuse by jumping to this overlapping prefix?”

For P = "abab":

Position i P[0..i] Longest proper prefix = suffix π[i]
0 “a” None (single char) 0
1 “ab” None 0
2 “aba” “a” (prefix “a” == suffix “a”) 1
3 “abab” “ab” (prefix “ab” == suffix “ab”) 2

Step 3: Building the Prefix Table (π)

This is done in O(m) time. It’s like searching P against itself, using a similar logic.

Pseudocode:

def compute_prefix_function(P):
    m = len(P)
    pi = [0] * m
    k = 0  # Length of current prefix-suffix match
    for i in range(1, m):
        while k > 0 and P[k] != P[i]:
            k = pi[k-1]  # Jump to previous overlap (reuse!)
        if P[k] == P[i]:
            k += 1
        pi[i] = k
    return pi

For “abab”:

Step 4: Searching with the Prefix Table

Now search T with P and π:

Pseudocode:

def kmp_search(T, P):
    n, m = len(T), len(P)
    if m == 0: return []
    pi = compute_prefix_function(P)
    q = 0
    matches = []
    for i in range(n):
        while q > 0 and P[q] != T[i]:
            q = pi[q-1]
        if P[q] == T[i]:
            q += 1
        if q == m:
            matches.append(i - m + 1)  # Starting position
            q = pi[q-1]  # Overlap for next search
    return matches

For T = "ababababc", P = "abab":

Wait, correction on trace (full run yields matches at 0 and 2; at i=7: after q=4 at i=7? No:

Wait, actually for this T, matches at 0,2,4. Yes!

See? On mismatches (like if there was one), it jumps q=π[q-1] instead of q=0.

Why It Feels Magical

Try It Yourself: Simple Python Code

def compute_prefix_function(P):
    m = len(P)
    pi = [0] * m
    k = 0
    for i in range(1, m):
        while k > 0 and P[k] != P[i]:
            k = pi[k - 1]
        if P[k] == P[i]:
            k += 1
        pi[i] = k
    return pi

def kmp_search(T, P):
    n, m = len(T), len(P)
    if m == 0: return []
    pi = compute_prefix_function(P)
    q = 0
    matches = []
    for i in range(n):
        while q > 0 and P[q] != T[i]:
            q = pi[q - 1]
        if P[q] == T[i]:
            q += 1
        if q == m:
            matches.append(i - m + 1)
            q = pi[q - 1]
    return matches

# Test
T = "ababababc"
P = "abab"
print(kmp_search(T, P))  # Output: [0, 2, 4]

Run this—tweak P/T to see jumps in action (add prints for q).

Common Pitfalls & Tips

This should click now—it’s just smart skipping via precomputed overlaps. If you have a specific example that’s tripping you up, share it!

GeeksforGeeks: KMP Algorithm
Wikipedia: Knuth–Morris–Pratt Algorithm


Back

x-ai/grok-4-fast

Donate