algoadvance

The problem is as follows:

A gene string can be represented by an 8-character-long string, with choices of characters being ‘A’, ‘C’, ‘G’, and ‘T’. Suppose we need to investigate a mutation from a start gene string to an end gene string such that only one gene can change at a time and each intermediate gene string must also be a valid gene string that exists in a given bank of allowed gene strings.

We need to determine the minimum number of mutations needed to transform the start string into the end string. If there is no valid transformation sequence, return -1.

Example:

start = "AAAAACCC"
end = "AACCCCCC"
bank = ["AAAACCCC", "AAACCCCC", "AACCCCCC"]
Output: 3

Clarifying Questions

  1. Can the start string be the same as the end string?
    • Yes, if the start string is the same as the end string, the answer should be 0 mutations.
  2. Is it guaranteed that the end string is in the bank?
    • No, if the end string is not in the bank, there is no valid mutation path, and the answer should be -1.
  3. Can the bank have duplicate gene strings?
    • The problem typically assumes a unique set of gene strings in the bank, but we can treat them as unique if they appear.

Strategy

  1. Breadth-First Search (BFS):
    • We’ll utilize BFS to explore mutations level by level. This approach ensures that we can find the shortest path (minimum mutations) from the start to the end string if it exists.
  2. Possible Mutations:
    • Generate all possible mutations that can be one character away from the current string and check if they are in the bank.

Code

from collections import deque

def minMutation(start: str, end: str, bank: list) -> int:
    # Convert the bank list to a set for efficient look-up
    bank_set = set(bank)
    
    # If the end string is not in the bank, return -1
    if end not in bank_set:
        return -1
    
    # Initialize the BFS queue
    queue = deque([(start, 0)])  # (current_string, current_mutations_count)
    
    # Helper function to find all possible single step mutations
    def get_mutations(current):
        mutations = []
        for i in range(len(current)):
            for c in 'ACGT':
                if current[i] != c:
                    mutation = current[:i] + c + current[i+1:]
                    if mutation in bank_set:
                        mutations.append(mutation)
        return mutations
    
    # BFS traversal
    while queue:
        current, mutations_count = queue.popleft()
        
        # If we reach the end string, return the number of mutations
        if current == end:
            return mutations_count
        
        # Explore all possible one-step mutations
        for mutation in get_mutations(current):
            # Remove the mutation from the bank set to avoid revisiting
            bank_set.remove(mutation)
            queue.append((mutation, mutations_count + 1))
    
    # If there is no valid mutation path, return -1
    return -1

# Example usage
start = "AAAAACCC"
end = "AACCCCCC"
bank = ["AAAACCCC", "AAACCCCC", "AACCCCCC"]

print(minMutation(start, end, bank))  # Output: 3

Time Complexity

Try our interview co-pilot at AlgoAdvance.com