Leetcode 609. Find Duplicate File in System
Given a list of directory info, including the directory path and all the files with contents in this directory, return all the duplicate files in the file system in terms of their paths.
A group of duplicate files consists of at least two files that have the same content.
A single directory info string in the input list has the following format:
It means there are n
files (f1.txt, f2.txt … fn.txt) with content (f1_content, f2_content … fn_content) respectively in the directory “root/d1/d2/…/dm”.
Note:
directory
and the files.directory
contains only alphabets and numbers.files
contains only alphabets and numbers.files content
is surrounded by parentheses.Input:
["root/a 1.txt(abcd) 2.txt(efgh)", "root/c 3.txt(abcd)", "root/c/d 4.txt(efgh)", "root 4.txt(efgh)"]
Output:
[ ["root/a/1.txt","root/c/3.txt"], ["root/a/2.txt","root/c/d/4.txt","root/4.txt"] ]
Input:
["root/a 1.txt(abcd) 2.txt(efgh)", "root/c 3.txt(abcd)", "root/c/d 4.txt(efgh)"]
Output:
[ ["root/a/1.txt","root/c/3.txt"], ["root/a/2.txt","root/c/d/4.txt"] ]
Let’s assume:
#include <vector>
#include <string>
#include <unordered_map>
#include <sstream>
using namespace std;
vector<vector<string>> findDuplicate(vector<string>& paths) {
unordered_map<string, vector<string>> contentToFilePaths;
for (const string& path : paths) {
istringstream ss(path);
string dir;
ss >> dir;
string fileInfo;
while (ss >> fileInfo) {
size_t openParen = fileInfo.find('(');
size_t closeParen = fileInfo.find(')');
string fileName = fileInfo.substr(0, openParen);
string fileContent = fileInfo.substr(openParen + 1, closeParen - openParen - 1);
contentToFilePaths[fileContent].emplace_back(dir + '/' + fileName);
}
}
vector<vector<string>> duplicates;
for (const auto& entry : contentToFilePaths) {
if (entry.second.size() > 1) {
duplicates.emplace_back(entry.second);
}
}
return duplicates;
}
O(n*k)
where n
is the number of directory entries and k
is the average length of directory/files/contents.O(n*k)
.O(n*k)
.O(n*k)
additional space.Thus, both the time and space complexity are O(n*k)
.
Got blindsided by a question you didn’t expect?
Spend too much time studying?
Or simply don’t have the time to go over all 3000 questions?