The problem is as follows:
A gene string can be represented by an 8-character-long string, with choices of characters being ‘A’, ‘C’, ‘G’, and ‘T’. Suppose we need to investigate a mutation from a start gene string to an end gene string such that only one gene can change at a time and each intermediate gene string must also be a valid gene string that exists in a given bank of allowed gene strings.
We need to determine the minimum number of mutations needed to transform the start string into the end string. If there is no valid transformation sequence, return -1.
start = "AAAAACCC"
end = "AACCCCCC"
bank = ["AAAACCCC", "AAACCCCC", "AACCCCCC"]
Output: 3
start
string be the same as the end
string?
start
string is the same as the end
string, the answer should be 0 mutations.end
string is in the bank
?
end
string is not in the bank
, there is no valid mutation path, and the answer should be -1.start
to the end
string if it exists.bank
.from collections import deque
def minMutation(start: str, end: str, bank: list) -> int:
# Convert the bank list to a set for efficient look-up
bank_set = set(bank)
# If the end string is not in the bank, return -1
if end not in bank_set:
return -1
# Initialize the BFS queue
queue = deque([(start, 0)]) # (current_string, current_mutations_count)
# Helper function to find all possible single step mutations
def get_mutations(current):
mutations = []
for i in range(len(current)):
for c in 'ACGT':
if current[i] != c:
mutation = current[:i] + c + current[i+1:]
if mutation in bank_set:
mutations.append(mutation)
return mutations
# BFS traversal
while queue:
current, mutations_count = queue.popleft()
# If we reach the end string, return the number of mutations
if current == end:
return mutations_count
# Explore all possible one-step mutations
for mutation in get_mutations(current):
# Remove the mutation from the bank set to avoid revisiting
bank_set.remove(mutation)
queue.append((mutation, mutations_count + 1))
# If there is no valid mutation path, return -1
return -1
# Example usage
start = "AAAAACCC"
end = "AACCCCCC"
bank = ["AAAACCCC", "AAACCCCC", "AACCCCCC"]
print(minMutation(start, end, bank)) # Output: 3
O(4 * 8 * N)
, where N
is the length of the bank. This comes from:
O(N)
for the BFS queue and the set used to track valid gene strings.Got blindsided by a question you didn’t expect?
Spend too much time studying?
Or simply don’t have the time to go over all 3000 questions?