algoadvance

Given a string s, find the maximum length of a substring that appears at least twice and the two occurrences do not overlap.

Clarifying Questions

  1. Case Sensitivity: Is the search case-sensitive?
    • We will assume that it is case-sensitive unless specified otherwise.
  2. Character Set: What character set is used in the string?
    • We will assume it to be the English alphabet unless there are special characters to consider.
  3. Length of String (s): What is the maximum length of s?
    • It is crucial for understanding the performance needs of our algorithm.

Given the assumptions, let’s move towards the strategy to solve this problem.

Strategy

  1. Sliding Window with Hashing:
    • Use a combination of sliding window and hashing to efficiently check for repeated non-overlapping substrings.
  2. Binary Search:
    • Use binary search to find the maximum length of such substrings. We incrementally increase the length of the substring we are searching for, using binary search to find the exact maximum length.
  3. Rabin-Karp Rolling Hash:
    • For substring checking, use the Rabin-Karp rolling hash technique to quickly compare substring hashes and ensure they appear twice without overlaps.

Code

Here’s how we can implement this strategy:

def max_length_substring_no_overlap(s: str) -> int:
    def search_substr_with_len(L):
        """
        This function returns True if there is a non-overlapping substring of length `L` which appears at least twice
        """
        seen = {}
        base, modulus = 26, 2**32
        
        # Precompute hash for the first window of length `L`
        hash_ = 0
        for i in range(L):
            hash_ = (hash_ * base + ord(s[i])) % modulus
        
        seen[hash_] = [0]  # Initial position of this hash
        
        left_base = pow(base, L, modulus)  # leftmost base for rolling hash
        
        # Rolling hash to compute hash values of all substrings of length `L`
        for start in range(1, len(s) - L + 1):
            hash_ = (hash_ * base - ord(s[start - 1]) * left_base + ord(s[start + L - 1])) % modulus
            
            if hash_ in seen:
                for prev_start in seen[hash_]:
                    if prev_start + L <= start:  # Ensure non-overlapping
                        return True
                seen[hash_].append(start)
            else:
                seen[hash_] = [start]
        
        return False
    
    # Binary search for the maximum length of substring
    low, high, result = 1, len(s), 0
    while low <= high:
        mid = (low + high) // 2
        if search_substr_with_len(mid):
            result = mid
            low = mid + 1
        else:
            high = mid - 1
    return result

# Example usage:
s = "banana"
print(max_length_substring_no_overlap(s))  # Output: 1, for "a"

Time Complexity

  1. Rabin-Karp Rolling Hash:
    • Computing the hash of the initial substring is O(L).
    • Updating the rolling hash for each substring in the search function takes O(L) time for each of the n-L+1 substrings.
    • Hence, each call to search_substr_with_len(L) is O(n), where ‘n’ is the length of the string.
  2. Binary Search:
    • Uses up to O(log n) calls to the search_substr_with_len(L) function.

Thus, the overall time complexity is O(n log n).

This implementation ensures that we efficiently find the maximum length substring that appears at least twice, while ensuring the occurrences do not overlap.

Try our interview co-pilot at AlgoAdvance.com