Leetcode 792. Number of Matching Subsequences
Given a string s
and an array of strings words
, return the number of words[i] that is a subsequence of s
.
A subsequence of a string is a new string generated from the original string with some characters (can be none) deleted without changing the relative order of the remaining characters.
s = "abcde", words = ["a", "bb", "acd", "ace"]
3
words
that are subsequences of s
: "a"
, "acd"
, "ace"
.Here are some questions to clarify the problem before diving into the solution:
s
and the words in the words
array be very large?We will proceed assuming the issues are case-sensitive, all characters are lower-case, and within typical input size limits for a standard interview problem.
s
or words
are empty and return 0 in such cases.s
to store the indices of each character in a hash map.words
, use the pre-processed indices to efficiently check if the word is a subsequence of s
.
s
.Here’s the implementation of the above strategy in C++:
#include <vector>
#include <string>
#include <unordered_map>
#include <iostream>
using namespace std;
bool isSubsequence(unordered_map<char, vector<int>>& char_indices, string& s, string& word) {
int prev_index = -1;
for (char c : word) {
if (char_indices.find(c) == char_indices.end()) {
return false;
}
auto &indices = char_indices[c];
auto it = upper_bound(indices.begin(), indices.end(), prev_index);
if (it == indices.end()) {
return false;
}
prev_index = *it;
}
return true;
}
int numMatchingSubseq(string s, vector<string>& words) {
unordered_map<char, vector<int>> char_indices;
for (int i = 0; i < s.size(); ++i) {
char_indices[s[i]].push_back(i);
}
int count = 0;
for (string& word : words) {
if (isSubsequence(char_indices, s, word)) {
++count;
}
}
return count;
}
int main() {
string s = "abcde";
vector<string> words = {"a", "bb", "acd", "ace"};
cout << numMatchingSubseq(s, words) << endl; // Output: 3
return 0;
}
The time complexity analysis of this solution:
char_indices
map takes O(n), where n is the length of string s
.words
, checking if it is a subsequence involves binary search within the character indices. If m
is the average length of the words and k
is the number of words:
k
words gives O(k * m * log n).Hence, the overall time complexity is O(n + k * m * log n), which is efficient for typical input sizes in interview problems.
Got blindsided by a question you didn’t expect?
Spend too much time studying?
Or simply don’t have the time to go over all 3000 questions?