Leetcode 609. Find Duplicate File in System
Given a list of directory info including directory path and all the files with contents in this directory, you need to find all the duplicate files in the file system in terms of their contents.
A group of duplicate files consists of at least two files that have exactly the same content.
A single directory info string in the input list has the following format:
"root/d1/d2/.../dn f1(content1) f2(content2) ... fk(contentk)"
It means there are k
files (f1
, f2
, … fk
) with content content1
, content2
, … contentk
respectively in directory root/d1/d2/.../dn
.
Note:
content
corresponds to a unique file identifier.The output is a list of groups of duplicate file paths. For each group, we should have at least two paths, and paths should be returned in a single list within the group.
Example 1:
Input:
[
"root/a 1.txt(abcd) 2.txt(efgh)",
"root/c 3.txt(abcd)",
"root/c/d 4.txt(efgh)",
"root 4.txt(efgh)"
]
Output:
[
["root/a/1.txt","root/c/3.txt"],
["root/a/2.txt","root/c/d/4.txt","root/4.txt"]
]
)
.import java.util.*;
public class FindDuplicateFileInSystem {
public List<List<String>> findDuplicate(String[] paths) {
// HashMap to store file contents as key and list of their paths as value
HashMap<String, List<String>> contentToFilePaths = new HashMap<>();
for (String path : paths) {
// Split the directory information.
String[] parts = path.split(" ");
String dirPath = parts[0];
// Parse each file in the current directory.
for (int i = 1; i < parts.length; i++) {
int openParen = parts[i].indexOf("(");
String fileName = parts[i].substring(0, openParen);
String content = parts[i].substring(openParen + 1, parts[i].length() - 1);
String filePath = dirPath + "/" + fileName;
contentToFilePaths
.computeIfAbsent(content, k -> new ArrayList<>())
.add(filePath);
}
}
// Collect all groups of duplicate files.
List<List<String>> duplicates = new ArrayList<>();
for (List<String> filePaths : contentToFilePaths.values()) {
if (filePaths.size() > 1) {
duplicates.add(filePaths);
}
}
return duplicates;
}
public static void main(String[] args) {
FindDuplicateFileInSystem solution = new FindDuplicateFileInSystem();
String[] input = {
"root/a 1.txt(abcd) 2.txt(efgh)",
"root/c 3.txt(abcd)",
"root/c/d 4.txt(efgh)",
"root 4.txt(efgh)"
};
List<List<String>> result = solution.findDuplicate(input);
System.out.println(result);
}
}
Overall, the algorithm runs in O(N) time complexity.
Got blindsided by a question you didn’t expect?
Spend too much time studying?
Or simply don’t have the time to go over all 3000 questions?