Given a list of directory paths, where each path is a string in the format:
"root/d1/d2/.../dn file1.txt(content1) file2.txt(content2) ... fileN.txt(contentN)"
You need to find all the groups of duplicate files in the file system in terms of their content.
A single directory path will contain at least one file. Each file consists of a filename and a content, both of them wrapped with parentheses. Multiple files in different directories can have the same content.
Return a list of groups of duplicate files. For each group, return the list of file paths that have the same content.
If these clarifications are understood, I’ll proceed with the solution.
def findDuplicate(paths):
from collections import defaultdict
# Dictionary to map file content to list of file paths
content_map = defaultdict(list)
for path in paths:
parts = path.split(" ")
root_dir = parts[0]
for file_info in parts[1:]:
# Split the file_info into filename and content
file_name, file_content = file_info.split("(")
file_content = file_content[:-1] # remove the closing ')'
# Construct the full file path
full_path = root_dir + "/" + file_name
# Map content to the full path
content_map[file_content].append(full_path)
# Prepare result ignoring data with only one file path
result = [file_paths for file_paths in content_map.values() if len(file_paths) > 1]
return result
# Example usage
paths = [
"root/a 1.txt(abcd) 2.txt(efgh)",
"root/c 3.txt(abcd)",
"root/c/d 4.txt(efgh)",
"root 4.txt(efgh)"
]
print(findDuplicate(paths))
Overall, the time complexity is O(N), where N is the total number of files across all directories.
Got blindsided by a question you didn’t expect?
Spend too much time studying?
Or simply don’t have the time to go over all 3000 questions?