Automatic deletion of duplicates of files

Discuss and share scripts and script files...
Post Reply
umapati
Posts: 12
Joined: 11 Aug 2011 13:40

Automatic deletion of duplicates of files

Post by umapati »

How can I delete duplicates automatically? Say, having found a list of duplicate files, I want to delete of each pair the one that has the shorter path?

highend
Posts: 14571
Joined: 06 Feb 2011 00:33
Location: Win Server 2022 @100%

Re: Automatic deletion of duplicates of files

Post by highend »

E.g. if all files have different path lengths:

Code: Select all

$duplicates = <<<>>>
D:\Temp\I am a duplicate.zip
D:\Temp\subfolder\I am a duplicate.zip
D:\C++\Hello World.cpp
D:\Hello World.cpp
>>>;

    $delList = "";
    foreach($file, $duplicates, "<crlf>") {
        if !($file) { continue; }
        $duplicates = replace($duplicates, $file);
        $match = regexmatches($duplicates, "^.*?" . regexEscape(getpathcomponent($file, "file")) . "$");
        if ($match) {
            $delFile = (strlen($file) > strlen($match)) ? $match : $file;
            $delList = $delList . $delFile . "|";
        }
    }
    if ($delList) { $confirm = confirm("Do you really want to delete these files:||$delList", "|", 2); }
    if ($confirm) { delete 1, 0, $delList; }

function regexEscape($string) {
    return regexreplace($string, "([\\^$.+*|?(){\[])", "\$1"); }
}
What's the rule for files with the same path length? E.g. C:\a.txt and D:\a.txt?
What happens when there are more than two files with the same name?
Like:
C:\a.txt
D:\Temp\a.txt
E:\Documents\a.txt
...
One of my scripts helped you out? Please donate via Paypal

Papoulka
Posts: 455
Joined: 13 Jul 2013 23:41

Re: Automatic deletion of duplicates of files

Post by Papoulka »

Get "Doublekiller" by Jan Schlüter (Google it). This is another fine piece of software from Germany, as it happens... I bought the Pro version long ago and have been glad of it many times. But I think the free version will do most of what you want. Note that development has evidently stopped and so both versions may disappear any time. A great tool.

highend
Posts: 14571
Joined: 06 Feb 2011 00:33
Location: Win Server 2022 @100%

Re: Automatic deletion of duplicates of files

Post by highend »

It does have an option to delete those duplicates that consists of a shorter path? Can't see that...
One of my scripts helped you out? Please donate via Paypal

Papoulka
Posts: 455
Joined: 13 Jul 2013 23:41

Re: Automatic deletion of duplicates of files

Post by Papoulka »

Doublekiller will not explicitly sort out the shorter paths, though its default output could turn out that way so it just might do this particular job. That won't beat one of your ad-hoc scripts :wink: but it's a great tool to have for de-duping in general.

highend
Posts: 14571
Joined: 06 Feb 2011 00:33
Location: Win Server 2022 @100%

Re: Automatic deletion of duplicates of files

Post by highend »

DoubleKiller presents paths alphabetically sorted (and you can't change that) so there isn't any way to achieve what the OP wants (automatic deletion).

E.g.:
D:\Temp\Username-Annabelle\a.txt
E:\Temp\Username-Beca\a.txt

For such a match you have to set checkmarks manually even if (probably) many of the matches can be selected by using "Check the first duplicate".

But... XY does already have a good duplicate finder and it allows you to use scripts on the results. Don't reinvent the wheel :)
One of my scripts helped you out? Please donate via Paypal

highend
Posts: 14571
Joined: 06 Feb 2011 00:33
Location: Win Server 2022 @100%

Re: Automatic deletion of duplicates of files

Post by highend »

A better version that takes my questions from the second post into account:

Works for matches > 2 and for same length matches.

Code: Select all

    $duplicates = <<<>>>
D:\Temp\I am a duplicate.zip
D:\Temp\subfolder1\I am a duplicate.zip
D:\Temp\subfolder2\I am a duplicate.zip
D:\C++\Hello World.cpp
D:\Hello World.cpp
>>>;

    $delList = $duplicates; // Make a copy of the duplicate list
    foreach($file, $duplicates, "<crlf>") {
        if !($file) { continue; }
        $pattern = "^.*?" . regexEscape(getpathcomponent($file, "file")) . "$";
        $matches = regexmatches($duplicates, $pattern);

        if ($matches) {
            $lastLen = "";
            foreach($match, $matches) { // Compare length of matches and only keep the one with the longest path
                $curLen = strlen($match);
                if ($curLen > $lastLen) { $keepFile = $match; }
                $lastLen = $curLen;
            }
            $delList = regexreplace($delList, "^" . regexEscape($keepFile) . "$"); // Delete the file to keep from the delete list
            $duplicates = regexreplace($duplicates, $pattern); // Delete all matches from the original list
        }
    }
    $delList = replace(formatlist($delList, "e", "<crlf>"), "<crlf>", "|");
    if ($delList) { $confirm = confirm("Do you really want to delete these files:||$delList", "|", 2); }
    if ($confirm) { delete 1, 0, $delList; }

function regexEscape($string) {
    return regexreplace($string, "([\\^$.+*|?(){\[])", "\$1"); }
}
One of my scripts helped you out? Please donate via Paypal

Post Reply