158. Accounts Merge
Problem Statement
Given a list of accounts where each element accounts[i] is a list of strings, where the first element accounts[i][0] is a name, and the rest of the elements are emails representing emails of the account.
Now, we would like to merge these accounts. Two accounts definitely belong to the same person if there is some common email to both accounts. Note that even if two accounts have the same name, they may belong to different people as people could have the same name. A person can have any number of accounts initially, but all of their accounts definitely have the same name.
After merging the accounts, return the accounts in the following format: the first element of each account is the name, and the rest of the elements are emails in sorted order. The accounts themselves can be returned in any order.
Additional information
1 <= accounts.length <= 1000
2 <= accounts[i].length <= 10
1 <= accounts[i][j].length <= 30
accounts[i][0] consists of English letters.
accounts[i][j] (for j > 0) is a valid email.
Example 1:
Input: accounts = [["John","[email protected]","[email protected]"],["John","[email protected]","[email protected]"],["Mary","[email protected]"],["John","[email protected]"]]
Output: [["John","[email protected]","[email protected]","[email protected]"],["Mary","[email protected]"],["John","[email protected]"]]
Explanation: The first and second John share the email "[email protected]", so they are the same person.
The third John and Mary are different people. Their emails are sorted alphabetically.
Example 2:
Input: accounts = [["Gabe","[email protected]","[email protected]","[email protected]"],["Kevin","[email protected]","[email protected]","[email protected]"],["Ethan","[email protected]","[email protected]","[email protected]"],["Hanzo","[email protected]","[email protected]","[email protected]"],["Fern","[email protected]","[email protected]","[email protected]"]]
Output: [["Ethan","[email protected]","[email protected]","[email protected]"],["Gabe","[email protected]","[email protected]","[email protected]"],["Hanzo","[email protected]","[email protected]","[email protected]"],["Kevin","[email protected]","[email protected]","[email protected]"],["Fern","[email protected]","[email protected]","[email protected]"]]
Graph Traversal (Depth-First Search)
Intuition
We can think of this problem as a network or a Graph. Every single email is a "node". If two emails appear in the exact same account list, we draw a line (an "edge") connecting them.
If John has Account A with [email1, email2] and Account B with [email2, email3], then email1 is connected to email2, and email2 is connected to email3. By traversing this graph, we can find all the emails that are chained together into a single "connected component".
We can map this out using an Adjacency List. We connect the first email of an account to all other emails in that same account. Then, we use Depth-First Search (DFS) to walk through the graph, gathering all connected emails into a single list, sorting them, and attaching the owner's name.
Algorithm
- Create an empty dictionary
graph to act as our adjacency list, and a dictionary email_to_name to remember who owns which email.
- Iterate through every
account in accounts:
- Extract the
name (index 0) and the first_email (index 1).
- For every email in the account starting from index 1:
- Map the email to the
name in email_to_name.
- If it is not the
first_email, add an undirected edge between first_email and this email in the graph.
- Initialize a
visited set to keep track of emails we've already processed, and a res list for the final merged accounts.
- Define a recursive
dfs(email, component) function that adds the current email to visited and component, and recursively calls itself for all neighbors in graph[email].
- Iterate through every unique email in
graph:
- If the email is not in
visited, we have found a new isolated person!
- Initialize an empty
component list and call dfs(email, component).
- Sort the
component alphabetically.
- Append
[email_to_name[email]] + component to res.
- Return
res.
def accounts_merge(accounts: list[list[str]]) -> list[list[str]]:
graph = {}
email_to_name = {}
# Build the graph and the email-to-name mapping
for acc in accounts:
name = acc[0]
first_email = acc[1]
for i in range(1, len(acc)):
email = acc[i]
if email not in graph:
graph[email] = []
email_to_name[email] = name
# Connect the first email to all other emails in this account
if i > 1:
graph[first_email].append(email)
graph[email].append(first_email)
visited = set()
res = []
# DFS helper function to gather all connected emails
def dfs(node, component):
visited.add(node)
component.append(node)
for neighbor in graph[node]:
if neighbor not in visited:
dfs(neighbor, component)
# Traverse the graph to find all connected components
for email in graph:
if email not in visited:
component = []
dfs(email, component)
res.append([email_to_name[email]] + sorted(component))
return res
Time & Space Complexity
- Time complexity: O(N * K log(N * K)) where N is the number of accounts and K is the maximum length of an account.
- Reason: Building the graph takes O(N * K). Traversing the graph takes O(N * K). The bottleneck is sorting the merged emails at the end, which in the worst case (where all emails belong to one person) takes O(N * K log(N * K)) time.
- Space complexity: O(N * K)
- Reason: The
graph, email_to_name map, and visited set will store every unique email, taking linear space relative to the total number of emails.
Disjoint Set (Union-Find) - Optimal
Intuition
Graph traversal works great, but building an adjacency list and running recursion can be heavy. A more elegant and architecturally standard way to group connected items is a Disjoint Set data structure, also known as Union-Find.
Union-Find keeps track of a "parent" or "leader" for every element. Initially, every email is its own leader. When we process an account, we simply tell the Union-Find structure to "Union" (merge) all the emails in that account together. The structure will intelligently update the pointers so that all emails in the same account (and any overlapping accounts) point to the exact same ultimate "root" leader.
Once all accounts are processed, we just iterate through every email, ask the Union-Find for its ultimate root leader, and group the emails into lists based on that root!
Algorithm
- Create a
UnionFind helper class with a parent dictionary, a find(x) method (with path compression), and a union(x, y) method.
- Initialize the
UnionFind instance and an email_to_name mapping dictionary.
- Iterate through the
accounts:
- Extract the
name and the first_email.
- For every email in the account:
- Map the email to the
name.
- Call
union(first_email, email) to group them together under the same root.
- Create a
merged_emails dictionary where the keys are the root leader emails, and the values are lists of all emails belonging to that root.
- Iterate through every unique email in
email_to_name:
- Find its
root using the find() method.
- Append the email to
merged_emails[root].
- Construct the final result array by sorting the email lists and prepending the owner's name.
class UnionFind:
def __init__(self):
self.parent = {}
def find(self, x):
# Initialize x to point to itself if not seen before
if x not in self.parent:
self.parent[x] = x
# Path compression
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x])
return self.parent[x]
def union(self, x, y):
rootX = self.find(x)
rootY = self.find(y)
if rootX != rootY:
self.parent[rootY] = rootX
def accounts_merge(accounts: list[list[str]]) -> list[list[str]]:
uf = UnionFind()
email_to_name = {}
# Group all connected emails
for acc in accounts:
name = acc[0]
first_email = acc[1]
for i in range(1, len(acc)):
email = acc[i]
email_to_name[email] = name
uf.union(first_email, email)
# Gather emails by their ultimate root parent
merged_emails = {}
for email in email_to_name:
root = uf.find(email)
if root not in merged_emails:
merged_emails[root] = []
merged_emails[root].append(email)
# Format the final result
res = []
for root, emails in merged_emails.items():
res.append([email_to_name[root]] + sorted(emails))
return res
Time & Space Complexity
- Time complexity: O(N * K log(N * K))
- Reason: The Union-Find operations
find and union take near-constant time O(α(N)) due to path compression. The time complexity is completely dominated by the final sorting step of the merged email lists, which takes O(N * K log(N * K)).
- Space complexity: O(N * K)
- Reason: The
parent dictionary inside the Union-Find class, the email_to_name map, and the merged_emails map all store at most every single email provided in the input.