Given a string s
and an array of strings words
, return the number of words[i]
that are subsequences of s
.
A subsequence of a string is a new string generated from the original string with some characters (can be none) deleted without changing the relative order of the remaining characters.
For example, "ace"
is a subsequence of "abcde"
.
Example 1:
Input: s = "abcde", words = ["a", "bb", "acd", "ace"]
Output: 3
Explanation: There are three strings in words that are subsequences of s: "a", "acd", "ace".
Example 2:
Input: s = "dsahjpjauf", words = ["ahjpjau","ja","ahbwzgqnuk","tnmlanowax"]
Output: 2
words
contain duplicate strings?
s
always guaranteed to be non-empty?
s
is not empty.s
and words
are lowercase English letters?
1 <= s.length <= 5 * 10^4
1 <= words.length <= 5000
1 <= words[i].length <= 50
The naive approach would involve checking each word in words
against s
to see if it is a subsequence. This involves iterating over both the string s
and each word in words
, which may not be efficient for larger inputs. Instead, a more efficient approach can involve the following steps:
s
: Create a mapping from each character to all its positions in s
.words
, use the preprocessed mapping to check if the word can be constructed by following the increasing order of indices.s
:
pos
where each key is a character, and the value is a list of indices where this character appears in s
.words
, use the pos
dictionary to find positions of characters in s
and ensure they appear in increasing order.This approach ensures that each character lookup in s
is done in logarithmic time relative to the length of s
, resulting in a more efficient solution.
s
: O(n), where n is the length of s
.words
, the lookup steps take O(k log n) where k is the length of the word, leading to O(m * k log n) for all words (m being the number of words).Overall, the time complexity is O(n + m * k log n), which is efficient given the constraints.
from collections import defaultdict
import bisect
def numMatchingSubseq(s, words):
# Create a dictionary to map each character to a list of indices where they appear in `s`
pos = defaultdict(list)
for i, char in enumerate(s):
pos[char].append(i)
def is_subsequence(word):
# Current index in `s` to match characters of `word`
prev_index = -1
for char in word:
if char not in pos:
return False
# Use binary search to find the smallest index in pos[char] which is greater than prev_index
indices = pos[char]
idx = bisect.bisect_right(indices, prev_index)
if idx == len(indices):
return False
prev_index = indices[idx]
return True
# Count the number of words that are subsequences of `s`
return sum(is_subsequence(word) for word in words)
This code efficiently counts the number of subsequences in s
for the list of words
.
Got blindsided by a question you didn’t expect?
Spend too much time studying?
Or simply don’t have the time to go over all 3000 questions?