The problem “Repeated DNA Sequences” asks you to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule. A DNA sequence is composed of a series of nucleotides represented by the characters ‘A’, ‘C’, ‘G’, and ‘T’.
You need to:
For the sake of this problem, we will assume the input string could be up to 10^5 characters (as typical for coding challenges), is case-sensitive, and we just need to list the sequences without worrying about their positions.
By using sets, we can efficiently check for repeating sequences.
def findRepeatedDnaSequences(s: str):
if len(s) < 10:
return []
seen, repeated = set(), set()
for i in range(len(s) - 9):
sequence = s[i:i + 10]
if sequence in seen:
repeated.add(sequence)
else:
seen.add(sequence)
return list(repeated)
# Example usage
dna_sequence = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
print(findRepeatedDnaSequences(dna_sequence)) # Output: ["AAAAACCCCC", "CCCCCAAAAA"]
seen
to record all the 10-letter sequences we come across.repeated
to record sequences that appear more than once.repeated
set to a list as the output.This solution efficiently solves the problem by ensuring that all operations with sets (insertion, membership test) are performed in average O(1) time.
Got blindsided by a question you didn’t expect?
Spend too much time studying?
Or simply don’t have the time to go over all 3000 questions?