Page 1 of 1

Automatic deletion of duplicates of files

Posted: 02 Jun 2015 19:28
by umapati
How can I delete duplicates automatically? Say, having found a list of duplicate files, I want to delete of each pair the one that has the shorter path?

Re: Automatic deletion of duplicates of files

Posted: 02 Jun 2015 20:51
by highend
E.g. if all files have different path lengths:

Code: Select all

$duplicates = <<<>>>
D:\Temp\I am a duplicate.zip
D:\Temp\subfolder\I am a duplicate.zip
D:\C++\Hello World.cpp
D:\Hello World.cpp
>>>;

    $delList = "";
    foreach($file, $duplicates, "<crlf>") {
        if !($file) { continue; }
        $duplicates = replace($duplicates, $file);
        $match = regexmatches($duplicates, "^.*?" . regexEscape(getpathcomponent($file, "file")) . "$");
        if ($match) {
            $delFile = (strlen($file) > strlen($match)) ? $match : $file;
            $delList = $delList . $delFile . "|";
        }
    }
    if ($delList) { $confirm = confirm("Do you really want to delete these files:||$delList", "|", 2); }
    if ($confirm) { delete 1, 0, $delList; }

function regexEscape($string) {
    return regexreplace($string, "([\\^$.+*|?(){\[])", "\$1"); }
}
What's the rule for files with the same path length? E.g. C:\a.txt and D:\a.txt?
What happens when there are more than two files with the same name?
Like:
C:\a.txt
D:\Temp\a.txt
E:\Documents\a.txt
...

Re: Automatic deletion of duplicates of files

Posted: 03 Jun 2015 20:31
by Papoulka
Get "Doublekiller" by Jan Schlüter (Google it). This is another fine piece of software from Germany, as it happens... I bought the Pro version long ago and have been glad of it many times. But I think the free version will do most of what you want. Note that development has evidently stopped and so both versions may disappear any time. A great tool.

Re: Automatic deletion of duplicates of files

Posted: 03 Jun 2015 22:23
by highend
It does have an option to delete those duplicates that consists of a shorter path? Can't see that...

Re: Automatic deletion of duplicates of files

Posted: 05 Jun 2015 01:53
by Papoulka
Doublekiller will not explicitly sort out the shorter paths, though its default output could turn out that way so it just might do this particular job. That won't beat one of your ad-hoc scripts :wink: but it's a great tool to have for de-duping in general.

Re: Automatic deletion of duplicates of files

Posted: 05 Jun 2015 10:12
by highend
DoubleKiller presents paths alphabetically sorted (and you can't change that) so there isn't any way to achieve what the OP wants (automatic deletion).

E.g.:
D:\Temp\Username-Annabelle\a.txt
E:\Temp\Username-Beca\a.txt

For such a match you have to set checkmarks manually even if (probably) many of the matches can be selected by using "Check the first duplicate".

But... XY does already have a good duplicate finder and it allows you to use scripts on the results. Don't reinvent the wheel :)

Re: Automatic deletion of duplicates of files

Posted: 05 Jun 2015 11:14
by highend
A better version that takes my questions from the second post into account:

Works for matches > 2 and for same length matches.

Code: Select all

    $duplicates = <<<>>>
D:\Temp\I am a duplicate.zip
D:\Temp\subfolder1\I am a duplicate.zip
D:\Temp\subfolder2\I am a duplicate.zip
D:\C++\Hello World.cpp
D:\Hello World.cpp
>>>;

    $delList = $duplicates; // Make a copy of the duplicate list
    foreach($file, $duplicates, "<crlf>") {
        if !($file) { continue; }
        $pattern = "^.*?" . regexEscape(getpathcomponent($file, "file")) . "$";
        $matches = regexmatches($duplicates, $pattern);

        if ($matches) {
            $lastLen = "";
            foreach($match, $matches) { // Compare length of matches and only keep the one with the longest path
                $curLen = strlen($match);
                if ($curLen > $lastLen) { $keepFile = $match; }
                $lastLen = $curLen;
            }
            $delList = regexreplace($delList, "^" . regexEscape($keepFile) . "$"); // Delete the file to keep from the delete list
            $duplicates = regexreplace($duplicates, $pattern); // Delete all matches from the original list
        }
    }
    $delList = replace(formatlist($delList, "e", "<crlf>"), "<crlf>", "|");
    if ($delList) { $confirm = confirm("Do you really want to delete these files:||$delList", "|", 2); }
    if ($confirm) { delete 1, 0, $delList; }

function regexEscape($string) {
    return regexreplace($string, "([\\^$.+*|?(){\[])", "\$1"); }
}