algoadvance

Given a paragraph and a list of banned words, return the most frequent word that is not banned. It is guaranteed there is at least one word that isn’t banned, and that the answer is unique.

The words in the paragraph are case-insensitive and the answer should be returned in lowercase.

Example 1:

Note:

Clarifying Questions

  1. Can there be capital letters in the paragraph?
    • Yes, words in the paragraph can be case-insensitive.
  2. Are punctuation marks considered part of the words?
    • No, punctuation marks should be ignored when identifying words.

Strategy

  1. Normalization:
    • Convert the paragraph into a uniform case (lowercase) to handle case insensitivity.
    • Remove punctuations to have a clean list of words.
  2. Splitting the Paragraph:
    • Split the paragraph based on spaces to get individual words.
  3. Frequency Count:
    • Use a dictionary to count the frequency of each word in the paragraph.
  4. Exclusion of Banned Words:
    • Iterate through the frequency dictionary and exclude words present in the banned list.
  5. Finding the Most Common Word:
    • Identify the word with the highest frequency that is not in the banned list.

Code

import re
from collections import defaultdict

def mostCommonWord(paragraph, banned):
    # Normalize the paragraph to lowercase
    paragraph = paragraph.lower()
    
    # Replace punctuations with spaces using regex
    paragraph = re.sub(r'[^\w\s]', ' ', paragraph)
    
    # Split paragraph into words
    words = paragraph.split()
    
    # Freq dict to count occurrences of each word
    word_count = defaultdict(int)
    
    for word in words:
        word_count[word] += 1
    
    # Remove banned words from the frequency dict
    for word in banned:
        if word in word_count:
            del word_count[word]
    
    # Find the word with the maximum frequency
    most_common = max(word_count, key=word_count.get)
    
    return most_common

# Example usage
paragraph = "Bob hit a ball, the hit BALL flew far after it was hit."
banned = ["hit"]
print(mostCommonWord(paragraph, banned))  # Output: "ball"

Time Complexity

  1. Normalization: O(n) where n is the length of the paragraph string due to conversion to lowercase and regex replacement.
  2. Split the Paragraph: O(n) where n is the length of the cleaned paragraph string.
  3. Frequency Count: O(m) where m is the number of words in the paragraph.
  4. Exclusion: O(b) where b is the number of banned words.
  5. Finding the Most Common: O(k) where k is the number of unique words in the frequency dictionary.

Overall, the time complexity is O(n + m + b + k), which simplifies generally to O(n) assuming m, b, and k are small compared to n.

Try our interview co-pilot at AlgoAdvance.com