algoadvance

Leetcode 187. Repeated DNA Sequences

Problem Statement

Leetcode Problem 187: Repeated DNA Sequences

Given a string s representing a DNA sequence, return all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule. You may return the answer in any order.

Example:

Constraints:

Clarifying Questions

  1. Q: Are the characters in the string only ‘A’, ‘C’, ‘G’, and ‘T’? A: Yes.

  2. Q: Should the results be case-sensitive? A: Yes, the results should be in the same case as given (which is uppercase in the example).

  3. Q: Can the substrings overlap? A: Yes, overlapping substrings are considered.

  4. Q: Do we need to handle any special characters or whitespace? A: No, the problem guarantees that the input consists only of ‘A’, ‘C’, ‘G’, and ‘T’.

Strategy

  1. Sliding Window: Use a sliding window of size 10 to traverse the string.
  2. Use of HashMap: Utilize a HashMap (or dictionary) to count the occurrences of each 10-letter sequence.
  3. Set for Results: Use a set to collect sequences that appear more than once to avoid duplicates.
  4. Edge Cases:
    • If the length of s is less than 10, return an empty list since no sequence of 10 letters can be formed.

Code

import java.util.*;

public class RepeatedDNASequences {
    public List<String> findRepeatedDnaSequences(String s) {
        List<String> result = new ArrayList<>();
        if (s == null || s.length() < 10) {
            return result;
        }
        
        Map<String, Integer> sequenceCount = new HashMap<>();
        
        for (int i = 0; i <= s.length() - 10; i++) {
            String sequence = s.substring(i, i + 10);
            sequenceCount.put(sequence, sequenceCount.getOrDefault(sequence, 0) + 1);
        }
        
        for (Map.Entry<String, Integer> entry : sequenceCount.entrySet()) {
            if (entry.getValue() > 1) {
                result.add(entry.getKey());
            }
        }
        
        return result;
    }
    
    public static void main(String[] args) {
        RepeatedDNASequences solution = new RepeatedDNASequences();
        System.out.println(solution.findRepeatedDnaSequences("AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"));
        // Output: ["AAAAACCCCC", "CCCCCAAAAA"]
    }
}

Time Complexity

This approach ensures that we efficiently find all repeated 10-letter-long DNA sequences while maintaining optimal performance.

Cut your prep time in half and DOMINATE your interview with AlgoAdvance AI