algoadvance

Problem Statement

Leetcode Problem 187: Repeated DNA Sequences

Given a string s representing a DNA sequence, return all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule. You may return the answer in any order.

Example:

Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
Output: ["AAAAACCCCC", "CCCCCAAAAA"]

Constraints:

0 <= s.length <= 10^5
s consists of only ‘A’, ‘C’, ‘G’, and ‘T’.

Clarifying Questions

Q: Are the characters in the string only ‘A’, ‘C’, ‘G’, and ‘T’? A: Yes.
Q: Should the results be case-sensitive? A: Yes, the results should be in the same case as given (which is uppercase in the example).
Q: Can the substrings overlap? A: Yes, overlapping substrings are considered.
Q: Do we need to handle any special characters or whitespace? A: No, the problem guarantees that the input consists only of ‘A’, ‘C’, ‘G’, and ‘T’.

Strategy

Sliding Window: Use a sliding window of size 10 to traverse the string.
Use of HashMap: Utilize a HashMap (or dictionary) to count the occurrences of each 10-letter sequence.
Set for Results: Use a set to collect sequences that appear more than once to avoid duplicates.
Edge Cases:
- If the length of s is less than 10, return an empty list since no sequence of 10 letters can be formed.

Code

import java.util.*;

public class RepeatedDNASequences {
    public List<String> findRepeatedDnaSequences(String s) {
        List<String> result = new ArrayList<>();
        if (s == null || s.length() < 10) {
            return result;
        }
        
        Map<String, Integer> sequenceCount = new HashMap<>();
        
        for (int i = 0; i <= s.length() - 10; i++) {
            String sequence = s.substring(i, i + 10);
            sequenceCount.put(sequence, sequenceCount.getOrDefault(sequence, 0) + 1);
        }
        
        for (Map.Entry<String, Integer> entry : sequenceCount.entrySet()) {
            if (entry.getValue() > 1) {
                result.add(entry.getKey());
            }
        }
        
        return result;
    }
    
    public static void main(String[] args) {
        RepeatedDNASequences solution = new RepeatedDNASequences();
        System.out.println(solution.findRepeatedDnaSequences("AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"));
        // Output: ["AAAAACCCCC", "CCCCCAAAAA"]
    }
}

Time Complexity

Time Complexity: O(n), where n is the length of the string s. The for loop runs n-9 times, each taking constant time due to substring extraction and hashmap operations.
Space Complexity: O(n), where n is the length of the string. This is due to the space needed for the hashmap to store all possible 10-letter sequences of the string.

This approach ensures that we efficiently find all repeated 10-letter-long DNA sequences while maintaining optimal performance.

Cut your prep time in half and DOMINATE your interview with AlgoAdvance AI

Got blindsided by a question you didn’t expect?
Spend too much time studying?
Or simply don’t have the time to go over all 3000 questions?