Leetcode 187. Repeated DNA Sequences
Leetcode Problem 187: Repeated DNA Sequences
Given a string s
representing a DNA sequence, return all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule. You may return the answer in any order.
Example:
s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
["AAAAACCCCC", "CCCCCAAAAA"]
Constraints:
0 <= s.length <= 10^5
s
consists of only ‘A’, ‘C’, ‘G’, and ‘T’.Q: Are the characters in the string only ‘A’, ‘C’, ‘G’, and ‘T’? A: Yes.
Q: Should the results be case-sensitive? A: Yes, the results should be in the same case as given (which is uppercase in the example).
Q: Can the substrings overlap? A: Yes, overlapping substrings are considered.
Q: Do we need to handle any special characters or whitespace? A: No, the problem guarantees that the input consists only of ‘A’, ‘C’, ‘G’, and ‘T’.
s
is less than 10, return an empty list since no sequence of 10 letters can be formed.import java.util.*;
public class RepeatedDNASequences {
public List<String> findRepeatedDnaSequences(String s) {
List<String> result = new ArrayList<>();
if (s == null || s.length() < 10) {
return result;
}
Map<String, Integer> sequenceCount = new HashMap<>();
for (int i = 0; i <= s.length() - 10; i++) {
String sequence = s.substring(i, i + 10);
sequenceCount.put(sequence, sequenceCount.getOrDefault(sequence, 0) + 1);
}
for (Map.Entry<String, Integer> entry : sequenceCount.entrySet()) {
if (entry.getValue() > 1) {
result.add(entry.getKey());
}
}
return result;
}
public static void main(String[] args) {
RepeatedDNASequences solution = new RepeatedDNASequences();
System.out.println(solution.findRepeatedDnaSequences("AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"));
// Output: ["AAAAACCCCC", "CCCCCAAAAA"]
}
}
O(n)
, where n
is the length of the string s
. The for loop runs n-9
times, each taking constant time due to substring extraction and hashmap operations.O(n)
, where n
is the length of the string. This is due to the space needed for the hashmap to store all possible 10-letter sequences of the string.This approach ensures that we efficiently find all repeated 10-letter-long DNA sequences while maintaining optimal performance.
Got blindsided by a question you didn’t expect?
Spend too much time studying?
Or simply don’t have the time to go over all 3000 questions?