algoadvance

Given a string s and an array of strings words, return the number of words[i] that are subsequences of s.

A subsequence of a string is a new string generated from the original string with some characters (can be none) deleted without changing the relative order of the remaining characters.

For example, "ace" is a subsequence of "abcde".

Example 1:

Input: s = "abcde", words = ["a", "bb", "acd", "ace"]
Output: 3
Explanation: There are three strings in words that are subsequences of s: "a", "acd", "ace".

Example 2:

Input: s = "dsahjpjauf", words = ["ahjpjau","ja","ahbwzgqnuk","tnmlanowax"]
Output: 2

Clarifying Questions

  1. Can words contain duplicate strings?
    • Yes, but they should still be counted individually.
  2. Is s always guaranteed to be non-empty?
    • Yes, you can assume s is not empty.
  3. Can we assume that all characters in s and words are lowercase English letters?
    • Yes, we can assume that.
  4. What are the constraints on the inputs?
    • 1 <= s.length <= 5 * 10^4
    • 1 <= words.length <= 5000
    • 1 <= words[i].length <= 50

Strategy

The naive approach would involve checking each word in words against s to see if it is a subsequence. This involves iterating over both the string s and each word in words, which may not be efficient for larger inputs. Instead, a more efficient approach can involve the following steps:

  1. Preprocess the string s: Create a mapping from each character to all its positions in s.
  2. Subsequence Matching: For each word in words, use the preprocessed mapping to check if the word can be constructed by following the increasing order of indices.

Detailed Steps:

  1. Preprocess s:
    • Create a dictionary pos where each key is a character, and the value is a list of indices where this character appears in s.
  2. Check for Subsequences:
    • For each word in words, use the pos dictionary to find positions of characters in s and ensure they appear in increasing order.

This approach ensures that each character lookup in s is done in logarithmic time relative to the length of s, resulting in a more efficient solution.

Time Complexity

Overall, the time complexity is O(n + m * k log n), which is efficient given the constraints.

Code

from collections import defaultdict
import bisect

def numMatchingSubseq(s, words):
    # Create a dictionary to map each character to a list of indices where they appear in `s`
    pos = defaultdict(list)
    for i, char in enumerate(s):
        pos[char].append(i)
    
    def is_subsequence(word):
        # Current index in `s` to match characters of `word`
        prev_index = -1
        for char in word:
            if char not in pos:
                return False
            # Use binary search to find the smallest index in pos[char] which is greater than prev_index
            indices = pos[char]
            idx = bisect.bisect_right(indices, prev_index)
            if idx == len(indices):
                return False
            prev_index = indices[idx]
        return True
    
    # Count the number of words that are subsequences of `s`
    return sum(is_subsequence(word) for word in words)

This code efficiently counts the number of subsequences in s for the list of words.

Try our interview co-pilot at AlgoAdvance.com