algoadvance

Given an array of strings words and an integer k, return the k most frequent strings. Return the answer sorted by frequency from highest to lowest. Sort the words with the same frequency by their lexicographical order.

Clarifying Questions

  1. Input Size Constraints:
    • What is the maximum length of the array words?
    • What is the length limit for each string in words?
  2. Output Specifics:
    • Should the output be in a specific format (e.g., list, array)?
    • How to handle ties in frequency?
  3. Edge Cases:
    • Are there any special characters in the strings?
    • Can k be larger than the number of unique strings in words?

Strategy

  1. Frequency Count:
    • Use a dictionary to count the frequency of each word in the input list words.
  2. Sorting:
    • Use a heap to keep track of the top k elements based on frequency. For words with the same frequency, use lexicographical order to prioritize.
  3. Heap Data Structure:
    • Use a min-heap for maintaining the top k elements since it allows for efficient insertion and removal.
  4. Final Output:
    • Extract elements from the heap and sort them to get the correct order (since min-heap gives the smallest elements first).

Time Complexity

  1. Frequency Count: O(N), where N is the number of words in the input list.
  2. Heap Operations: O(M log K), where M is the number of unique words and K is the size of the heap.
  3. Sorting the Result: O(K log K) since we need to sort at the final stage.

The overall time complexity is approximately O(N + M log K + K log K).

Code

from typing import List
from collections import Counter
import heapq

def topKFrequent(words: List[str], k: int) -> List[str]:
    # Step 1: Count the frequency of each word
    freq = Counter(words)
    
    # Step 2: Use a heap to find the k most frequent words
    heap = []
    for word, count in freq.items():
        # Python's heapq module creates a min-heap, we use negative count to get max-heap behavior
        heapq.heappush(heap, (-count, word))
    
    # Step 3: Extract the top k frequent words from the heap
    result = []
    for _ in range(k):
        result.append(heapq.heappop(heap)[1])
    
    return result

# Example usage:
words = ["i", "love", "leetcode", "i", "love", "coding"]
k = 2
print(topKFrequent(words, k))  # Output: ["i", "love"]

This code will provide the k most frequent words in the correct order, while handling ties by lexicographical sorting.

Additional Comments

Try our interview co-pilot at AlgoAdvance.com