===================
== Thomas Pinder ==
===================
Bayesian ML, Causal Inference, and JAX


Is A String a Substring of Another String?

Problem: Given two strings s and t, return True if s is a subsequence of t, or False otherwise.

Testing Function
1
2
3
4
5
6
7
8
from typing import Callable


def test(approach: Callable[[str, str], bool]) -> None:
    test1 = approach("abc", "ahbgdc") == True
    test2 = approach("axc", "ahbgdc") == False
    status = "Pass" if test1 and test2 else "Fail"
    print(f">>> Result: `{approach.__name__}` - {status}")

Brute Force Approach

A sequence implies that order matters. Therefore, we’ll be traversing t and checking if we can find the characters of s in order. The order here helps and it immediately makes me think that a two-pointer technique is a good fit. I think it is often helpful to imagine what the brute force solution looks like. In this case, we’d be checking every possible subsequence of t to see if it matches s.

This would look something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
def brute_force(s: str, t: str) -> bool:
    for i in range(len(t)):
        for j in range(len(s)):
            if t[i] == s[j]:
                s = s[1:]
                break
        if not s:
            return True
    return False


test(brute_force)
>>> Result: `brute_force` - Pass

The time complexity of the above brute-force approach solution is $\mathcal{O}(M \times N)$, where $M$ is the length of t and $N$ is the length of s. This is not great, but it does help provide us a baseline and, at the very worst, we have provided a working solution.

Two-Pointer Approach

As I mentioned earlier, my immediate reaction here is that we can use a two-pointer technique. To achieve this, we’ll initialise two pointers i and j to 0. We’ll then iterate over t and s using these pointers. If the characters at the current pointers match, we’ll increment j. Otherwise, we’ll increment i. If we manage to iterate over all of s, then s is a subsequence of t.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
def two_pointers(s: str, t: str) -> bool:
    # Initialise the pointers to 0
    i = 0
    j = 0
    # Residual is the number of characters in the substring left to find
    residual = len(s)
    # Whilst the left pointer is strictly less than the number of
    # characters in s and j less than those t, keep moving along.
    while i < len(s) and j < len(t):
        # For the current pointer pair, increment the subsequences' pointer
        # if there is match in the superstring
        if s[i] == t[j]:
            i += 1
            residual -= 1
        # If no match was found for the current right pointer, increment.
        j += 1
    # If there was not characters left to check, then we have a substring
    result = residual == 0
    return result

test(two_pointers)
>>> Result: `two_pointers` - Pass

The time complexity of the above approach is $\mathcal{O}(M)$, where $M$ is the length of t. This is a significant improvement over the brute-force approach. Perhaps the bigger improvement is the storage complexity which, due to the use of two pointers, is $\mathcal{O}(1)$.