algoadvance

Leetcode 792. Number of Matching Subsequences

Problem Statement

Given a string s and an array of strings words, return the number of words[i] that is a subsequence of s.

A subsequence of a string is a new string generated from the original string with some characters (can be none) deleted without changing the relative order of the remaining characters.

Example:

Input: s = "abcde", words = ["a", "bb", "acd", "ace"]
Output: 3
Explanation: There are three words in words that are subsequences of s: "a", "acd", "ace".

Clarifying Questions

Here are some questions to clarify the problem before diving into the solution:

Input Size: Can the length of the string s and the words in the words array be very large?
Case Sensitivity: Is this problem case-sensitive? (Assume it is unless stated otherwise.)
Characters: Are the characters strictly lower-case alphabets?
Constraints: Are there any constraints or special cases to consider?

We will proceed assuming the issues are case-sensitive, all characters are lower-case, and within typical input size limits for a standard interview problem.

Strategy

Initial Verification: Check if either s or words are empty and return 0 in such cases.
Character Indexing: Pre-process the string s to store the indices of each character in a hash map.
Check Subsequences: For each word in words, use the pre-processed indices to efficiently check if the word is a subsequence of s.
- Use a pointer to check the validity of each character in the current word.
Counting Subsequences: Count how many words from the list are subsequences of s.

Code

Here’s the implementation of the above strategy in C++:

#include <vector>
#include <string>
#include <unordered_map>
#include <iostream>
using namespace std;

bool isSubsequence(unordered_map<char, vector<int>>& char_indices, string& s, string& word) {
    int prev_index = -1;
    for (char c : word) {
        if (char_indices.find(c) == char_indices.end()) {
            return false;
        }
        auto &indices = char_indices[c];
        auto it = upper_bound(indices.begin(), indices.end(), prev_index);
        if (it == indices.end()) {
            return false;
        }
        prev_index = *it;
    }
    return true;
}

int numMatchingSubseq(string s, vector<string>& words) {
    unordered_map<char, vector<int>> char_indices;
    for (int i = 0; i < s.size(); ++i) {
        char_indices[s[i]].push_back(i);
    }

    int count = 0;
    for (string& word : words) {
        if (isSubsequence(char_indices, s, word)) {
            ++count;
        }
    }

    return count;
}

int main() {
    string s = "abcde";
    vector<string> words = {"a", "bb", "acd", "ace"};
    cout << numMatchingSubseq(s, words) << endl;  // Output: 3
    return 0;
}

Time Complexity

The time complexity analysis of this solution:

Pre-processing: Building the char_indices map takes O(n), where n is the length of string s.
Checking Subsequences: For each word in words, checking if it is a subsequence involves binary search within the character indices. If m is the average length of the words and k is the number of words:
- Each binary search on average takes O(log n).
- Checking each word thus takes O(m * log n).
- Doing this for k words gives O(k * m * log n).

Hence, the overall time complexity is O(n + k * m * log n), which is efficient for typical input sizes in interview problems.

Cut your prep time in half and DOMINATE your interview with AlgoAdvance AI

Got blindsided by a question you didn’t expect?
Spend too much time studying?
Or simply don’t have the time to go over all 3000 questions?