algoadvance

Leetcode 609. Find Duplicate File in System

Problem Statement

Given a list of directory info including directory path and all the files with contents in this directory, you need to find all the duplicate files in the file system in terms of their contents.

A group of duplicate files consists of at least two files that have exactly the same content.

A single directory info string in the input list has the following format:

"root/d1/d2/.../dn f1(content1) f2(content2) ... fk(contentk)"

It means there are k files (f1, f2, … fk) with content content1, content2, … contentk respectively in directory root/d1/d2/.../dn.

Note:

The output is a list of groups of duplicate file paths. For each group, we should have at least two paths, and paths should be returned in a single list within the group.

Example

Example 1:

Input:
[
  "root/a 1.txt(abcd) 2.txt(efgh)",
  "root/c 3.txt(abcd)",
  "root/c/d 4.txt(efgh)",
  "root 4.txt(efgh)"
]

Output: 
[
  ["root/a/1.txt","root/c/3.txt"],
  ["root/a/2.txt","root/c/d/4.txt","root/4.txt"]
]

Clarifying Questions

  1. Can the contents of the files contain spaces or special characters?
    • Yes, contents can contain any character except the closing parenthesis ).
  2. Is content comparison case-sensitive?
    • Yes, it is case-sensitive per the unique file identifier assumption.
  3. Should we return the results in a specific order?
    • No specific order is required for the output lists.
  4. Can there be nested directories?
    • Yes, directories can be nested.

Strategy

  1. Parse Input:
    • Iterate over each directory string.
    • Split the string to extract the directory path and all the files with their contents.
  2. Organize Data:
    • Use a HashMap where each key is the content of the file and the value is a list of file paths.
  3. Filter Duplicates:
    • Iterate over the HashMap and collect lists that have more than one file path.

Code

import java.util.*;

public class FindDuplicateFileInSystem {
    public List<List<String>> findDuplicate(String[] paths) {
        // HashMap to store file contents as key and list of their paths as value
        HashMap<String, List<String>> contentToFilePaths = new HashMap<>();
        
        for (String path : paths) {
            // Split the directory information.
            String[] parts = path.split(" ");
            String dirPath = parts[0];
            
            // Parse each file in the current directory.
            for (int i = 1; i < parts.length; i++) {
                int openParen = parts[i].indexOf("(");
                String fileName = parts[i].substring(0, openParen);
                String content = parts[i].substring(openParen + 1, parts[i].length() - 1);
                
                String filePath = dirPath + "/" + fileName;
                
                contentToFilePaths
                    .computeIfAbsent(content, k -> new ArrayList<>())
                    .add(filePath);
            }
        }
        
        // Collect all groups of duplicate files.
        List<List<String>> duplicates = new ArrayList<>();
        for (List<String> filePaths : contentToFilePaths.values()) {
            if (filePaths.size() > 1) {
                duplicates.add(filePaths);
            }
        }
        
        return duplicates;
    }

    public static void main(String[] args) {
        FindDuplicateFileInSystem solution = new FindDuplicateFileInSystem();
        String[] input = {
            "root/a 1.txt(abcd) 2.txt(efgh)",
            "root/c 3.txt(abcd)",
            "root/c/d 4.txt(efgh)",
            "root 4.txt(efgh)"
        };
        List<List<String>> result = solution.findDuplicate(input);
        System.out.println(result);
    }
}

Time Complexity

Overall, the algorithm runs in O(N) time complexity.

Cut your prep time in half and DOMINATE your interview with AlgoAdvance AI