algoadvance

Problem Statement

Given a list of words, we want to encode them using the least number of characters. The encoding must be such that each word ends uniquely with a "#" character to delimit its end. A word can be part of another word’s encoding. For example: “time”, “me”, “bell” can be encoded as “time#bell#”, because “me” is a suffix of “time”.

The goal is to determine the minimum length of the encoded string that contains all the given words as suffixes.

Example:

Input: words = ["time", "me", "bell"]
Output: 10
Explanation: "time#bell#" 

Clarifying Questions

Can words contain duplicates in the input list?
- Yes, but duplicate words should be counted only once.
Are there any constraints on the lengths of the words or the number of words?
- The length of each word is between 1 and 7.
- The total number of words will not exceed 2000.

Strategy

Tries Data Structure: Use a Trie (prefix tree) for efficiently identifying suffix relationships between words.
Reverse Words: Insert each word in the Trie in reversed order. This way, we can easily check if a word is a suffix of another.
Identify Leaves: After building the Trie, any leaf node that represents the end of an inserted word contributes to the encoding length. Each leaf node represents a unique suffix.
Calculate the Encoding Length: Sum up the lengths of all unique suffixes (leaf nodes) in the Trie, adding one for each "#" character.

Code

import java.util.HashSet;
import java.util.Set;

public class Solution {
    public int minimumLengthEncoding(String[] words) {
        // Use a set to collect all unique words
        Set<String> uniqueWords = new HashSet<>();
        for (String word : words) {
            uniqueWords.add(word);
        }

        // Create a Trie
        TrieNode root = new TrieNode();

        // Insert words into the Trie in reverse order
        for (String word : uniqueWords) {
            insertReverse(root, word);
        }

        // Calculate the minimum length encoding
        return countEncodedLength(root);
    }

    private void insertReverse(TrieNode root, String word) {
        TrieNode node = root;
        for (int i = word.length() - 1; i >= 0; i--) {
            char ch = word.charAt(i);
            if (!node.children.containsKey(ch)) {
                node.children.put(ch, new TrieNode());
            }
            node = node.children.get(ch);
        }
        node.isEndOfWord = true;
    }

    private int countEncodedLength(TrieNode root) {
        return countEncodedLength(root, 1);
    }

    private int countEncodedLength(TrieNode node, int depth) {
        if (node.children.isEmpty()) {
            return depth;
        }

        int length = 0;
        for (TrieNode child : node.children.values()) {
            length += countEncodedLength(child, depth + 1);
        }
        return length;
    }

    // Definition for a TrieNode
    class TrieNode {
        boolean isEndOfWord;
        Map<Character, TrieNode> children;

        TrieNode() {
            isEndOfWord = false;
            children = new HashMap<>();
        }
    }
}

Time Complexity

Insertion into Trie: For each word of maximum length L, inserting into the Trie takes O(L). Inserting all words will thus take O(W * L), where W is the number of words.
Calculating Length: Traversing the Trie to sum up the encoding length is O(W * L) as well since we might visit each character of each word at most once.

Overall, the time complexity of the solution is O(W * L).

Cut your prep time in half and DOMINATE your interview with AlgoAdvance AI

Got blindsided by a question you didn’t expect?
Spend too much time studying?
Or simply don’t have the time to go over all 3000 questions?