algoadvance

Given a list of accounts where each element accounts[i] is a list of strings, where the first element accounts[i][0] is a name, and the rest of the elements are emails representing emails of the account.

Your task is to merge the accounts. Two accounts definitely belong to the same person if there is some common email. Note that even if two accounts have the same name, they may belong to different people, as people could have the same name. A person can have any number of accounts initially, but all their accounts ultimately should be merged into one account. The order in which emails are presented in the returned accounts does not matter.

You need to return the accounts in the following format: the first element of each account is the name, and the rest of the elements are emails in sorted order.

Example:

Input: 
accounts = [["John", "johnsmith@mail.com", "john00@mail.com"], 
            ["John", "johnnybravo@mail.com"], 
            ["John", "johnsmith@mail.com", "john_newyork@mail.com"], 
            ["Mary", "mary@mail.com"]]

Output: [["John", "john00@mail.com", "john_newyork@mail.com", "johnsmith@mail.com"], 
         ["John", "johnnybravo@mail.com"], 
         ["Mary", "mary@mail.com"]]

Clarifying Questions

Is the order of the merged accounts in the output important?
- No, the order of accounts in the final output does not matter.
Should emails in the merged accounts be sorted?
- Yes, emails within each merged account should be sorted lexicographically.

Strategy

We can use the Union-Find (Disjoint Set Union) data structure to solve this problem efficiently. The steps are as follows:

Initialize Data Structures: Create a parent map and a union-find structure to keep track of email connectivity.
Union Phase: For each account, union-find the emails by connecting all emails in the same account.
Find Phase: After all unions are done, we find the connected components of emails (i.e., the connected emails that belong to the same account).
Merge Accounts: Collect the emails from each connected component, sort them and map them back to their original account name.

Code

Here is the Python code that implements the above strategy:

from collections import defaultdict

class UnionFind:
    def __init__(self):
        self.parent = {}

    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])  # Path compression
        return self.parent[x]

    def union(self, x, y):
        rootX = self.find(x)
        rootY = self.find(y)
        if rootX != rootY:
            self.parent[rootY] = rootX

def accountsMerge(accounts):
    uf = UnionFind()
    email_to_name = {}

    # Initialize the union-find structure and email_to_name map
    for account in accounts:
        name = account[0]
        first_email = account[1]
        for email in account[1:]:
            if email not in uf.parent:
                uf.parent[email] = email
            if first_email not in uf.parent:
                uf.parent[first_email] = first_email
            uf.union(first_email, email)
            email_to_name[email] = name

    # Find the root email for each connected component
    email_clusters = defaultdict(list)
    for email in uf.parent:
        root_email = uf.find(email)
        email_clusters[root_email].append(email)

    # Construct the result
    result = []
    for emails in email_clusters.values():
        result.append([email_to_name[emails[0]]] + sorted(emails))

    return result

Time Complexity

Union-Find Operations: O(A*α(B)) where:
- A is the number of accounts.
- B is the total number of emails.
- α is the inverse Ackermann function which grows very slowly.
Constructing the email clusters: O(B log B) due to sorting emails within each cluster.

Overall time complexity is approximately O(A + B log B), making the solution efficient given the constraints.

Try our interview co-pilot at AlgoAdvance.com

Got blindsided by a question you didn’t expect?
Spend too much time studying?
Or simply don’t have the time to go over all 3000 questions?