algoadvance

Problem Statement

LeetCode 187: Repeated DNA Sequences

The problem statement is as follows:

All DNA is composed of a series of nucleotides abbreviated as ‘A’, ‘C’, ‘G’, and ‘T’, for example: “ACGAATTCCG”. When studying DNA, it is generally useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

Example 1:
Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
Output: ["AAAAACCCCC","CCCCCAAAAA"]
Example 2:
Input: s = "AAAAAAAAAAAAA"
Output: ["AAAAAAAAAA"]

Clarifying Questions

Can the input string be empty or less than 10 characters?
- If so, should we return an empty list?
Are we considering only sequences that appear more than once?
Is the input string always uppercase?
Can we assume that the input string consists only of ‘A’, ‘C’, ‘G’, and ‘T’?

Assuming the input string will be valid and consist of uppercase characters ‘A’, ‘C’, ‘G’, and ‘T’.

Strategy

To solve this problem, let’s follow these steps:

Use a hash set to keep track of all seen 10-letter-long substrings.
Use another hash set to identify substrings that occur more than once.
Iterate through the string, and for each 10-letter-long substring, check if it has been seen before.
If it has been seen before, add it to the result set which stores duplicates.
Convert the result set to a list and return it.

Code

#include <vector>
#include <string>
#include <unordered_set>

using namespace std;

class Solution {
public:
    vector<string> findRepeatedDnaSequences(string s) {
        unordered_set<string> seen, repeated;
        vector<string> result;
        
        for (int i = 0; i + 9 < s.size(); ++i) {
            string substring = s.substr(i, 10);
            if (seen.find(substring) != seen.end()) {
                repeated.insert(substring);
            } else {
                seen.insert(substring);
            }
        }
        
        for (const string& str : repeated) {
            result.push_back(str);
        }
        
        return result;
    }
};

Time Complexity

Time Complexity:
- Iterating over the string and extracting substrings takes O(n).
- Insertion and lookup in hash sets are average O(1).
- So, the overall time complexity is O(n).
Space Complexity:
- We are using two sets to store substrings, which could each potentially store all n-10+1 substrings in the worst case.
- So, the space complexity is O(n).

Cut your prep time in half and DOMINATE your interview with AlgoAdvance AI

Got blindsided by a question you didn’t expect?
Spend too much time studying?
Or simply don’t have the time to go over all 3000 questions?