algoadvance

Problem Statement

820. Short Encoding of Words

Given a list of words, words, each word can be written as a reference string (i.e., a string that directly references some previous written string in the encoding). Specifically, the reference string for a word-a can either be the word-a itself, or a suffix of word-a. More formally, a suffix of the reference string of a word is formed as follows:

If word[i:] is present in the encoding list, then word can be encoded using word[i:].

Your task is to find the length of the shortest reference string that can encode all words in the given list.

Example 1:

Input: words = ["time", "me", "bell"]
Output: 10
Explanation: "time" and "bell" both have no proper suffixes in the list, while "me" is a suffix of "time". So the encoding is "time#bell#" and its length is 10.

Clarifying Questions

Can the list of words contain duplicates?
Is there a limit on the number or length of the words in the input list?
Can we assume all input words are lower case?
Are words sorted in any specific order?
What is the expected output if the input list is empty?

Based on common constraints in problems like this, we can assume:

No duplicates in the list.
Words contain only lowercase English letters.
If the list is empty, the expected output would be 0.

Strategy

Remove Suffixes:
- We first eliminate any word that is a suffix of another word, because storing such words explicitly is redundant.
Set Data Structure:
- We’ll use a set data structure to keep track of unique suffixes.
- For each word, we’ll add all possible suffixes to the set.
Encoding Calculation:
- Once we have the unique suffixes, the length of the encoded reference string can be calculated as the sum of the lengths of these unique words plus one for each to account for the “#” character.

Steps:

Initialize an empty set for suffix collection.
For each word in the list, generate all possible suffixes and add them to the set.
Compute the length of the encoding as the sum of lengths of all words in the set plus the number of words in the set (each word ends with ‘#’).

Code

#include <iostream>
#include <vector>
#include <unordered_set>
#include <string>

using namespace std;

int minimumLengthEncoding(vector<string>& words) {
    unordered_set<string> uniqueSuffixes;
    
    // Add all suffixes of each word to the set
    for (const auto& word : words) {
        for (int i = 0; i < word.size(); ++i) {
            uniqueSuffixes.insert(word.substr(i));
        }
    }
    
    // Remove suffixes that are themselves suffixes of others
    for (const auto& word : words) {
        for (int i = 1; i < word.size(); ++i) {
            uniqueSuffixes.erase(word.substr(i));
        }
    }
    
    // Calculate the length of the final encoded string
    int totalLength = 0;
    for (const auto& suffix : uniqueSuffixes) {
        totalLength += suffix.size() + 1; // +1 for the '#' character
    }
    
    return totalLength;
}

int main() {
    vector<string> words = {"time", "me", "bell"};
    cout << "Output: " << minimumLengthEncoding(words) << endl;
    return 0;
}

Time Complexity

The time complexity of this solution is influenced by:

Iterating through each word and all its suffixes, which is O(n*k^2) in the worst case where n is the number of words and k is the average length of words.
Inserting strings into a set and removing them which has O(1) average time complexity due to hashing.

Thus, the overall time complexity is O(n*k^2). However, this should be efficient for typical word list constraints found in coding interview problems.

Cut your prep time in half and DOMINATE your interview with AlgoAdvance AI

Got blindsided by a question you didn’t expect?
Spend too much time studying?
Or simply don’t have the time to go over all 3000 questions?