Automatically search for duplicate dupes files by scripts

Discuss and share scripts and script files...
Post Reply
hyperman
Posts: 3
Joined: 20 Aug 2020 08:08

Automatically search for duplicate dupes files by scripts

Post by hyperman »

Hello,

I try to a automatically search for duplicate dupes files by scripts, but without extension, and select them, for delete or move;
like this exemple (in reality the list of file is more long):

video_1.avi , 20Mo
video_2.avi , 40Mo
video_2.mp4 , 5Mo
video_3.mpeg , 80Mo
video_3.mov , 50Mo
video_3.webm , 1Mo
video_3.avi , 30Mo
video_3.mp4 , 3Mo
video_4.avi , 60Mo

On this exemple I have "dupes" : video_2, and multiples dupes video_3, with different size and extensions.
I want be able to keep the most smaller files of each dupes like:

video_1.avi , 20Mo
video_2.mp4 , 5Mo
video_3.webm , 1Mo
video_4.avi , 60Mo


I create a code that work well each time:
Only to find and select dupes by name only
Please be patient: I am a coding noob...

Code: Select all

"Search Dupes Files By Name Ignore Extension And Select Them|:find"
   //variables
   $RESULT = "" ;
   $LIST = "" ;
   $COUNTER = 0 ;
   //make the list only for files
   $LIST = listfolder(,,1, <crlf>);
   //search for dupes names
   foreach( $CURRENT_ITEM , $LIST , "<crlf>") 
      {
      foreach( $NEXT_ITEM , $LIST , "<crlf>") 
         {
         $CURRENT_ITEM_NAME = $CURRENT_ITEM ;
         $NEXT_ITEM_NAME    = $NEXT_ITEM ;
         //only base name no extension
         $CURRENT_ITEM_NAME = getpathcomponent( $CURRENT_ITEM_NAME , "base");
         $NEXT_ITEM_NAME    = getpathcomponent( $NEXT_ITEM_NAME    , "base");
         //compare lower case names and no diacritics
         $CURRENT_ITEM_NAME = recase( $CURRENT_ITEM_NAME , l );
         $NEXT_ITEM_NAME    = recase( $NEXT_ITEM_NAME    , l );
         $CURRENT_ITEM_NAME = recase( $CURRENT_ITEM_NAME , "removediacritics" ); 
         $NEXT_ITEM_NAME    = recase( $NEXT_ITEM_NAME    , "removediacritics" ); 
         //compare
         if ( $CURRENT_ITEM_NAME == $NEXT_ITEM_NAME )
            {
            //COUNTER to compare only two at time
            $COUNTER++ ;
            if ( $COUNTER > 1 ) 
               {
               $RESULT = $RESULT . "" . $CURRENT_ITEM . "<crlf>" ;
               }
            }
         }; 
      $COUNTER = 0 ;
      };
   selectitems $RESULT ;
   //if nothing dupes
   if ( $RESULT == "" )
      {
      status ("No Dupes" , , "select");
      }
But it's not enough! ...


So I tried to create another script but it doesn't work well:
in order to keep the smallest files, I try to select only the dupes bigger files
there are various problems ...
- it cannot select ALL the bigger files, when there are several dupes, (like the example video_3)
- sometimes the script completely skips some video dupes ?!! (and I don't know why ? since the first script work well, even "filesize()" always give me a size for each files)(bug?)

Code: Select all

"Search Dupes Files By Name Ignore Extension And Select The Biggest (no final, work only with 2 dupes each, skip sometimes video dupes files)|:find"
   //variables
   $RESULT = "" ;
   $LIST = "" ;
   $COUNTER = 0 ;
   //make the list only for files
   $LIST = listfolder(,,1, <crlf>);
   //search for dupes names
   foreach( $CURRENT_ITEM , $LIST , "<crlf>") 
      {
      foreach( $NEXT_ITEM , $LIST , "<crlf>") 
         {
         $CURRENT_ITEM_NAME = $CURRENT_ITEM ;
         $NEXT_ITEM_NAME    = $NEXT_ITEM ;
         //only base name no extension
         $CURRENT_ITEM_NAME = getpathcomponent( $CURRENT_ITEM_NAME , "base");
         $NEXT_ITEM_NAME    = getpathcomponent( $NEXT_ITEM_NAME    , "base");
         //compare lower case names no diacritics
         $CURRENT_ITEM_NAME = recase( $CURRENT_ITEM_NAME , l );
         $CURRENT_ITEM_NAME = recase( $CURRENT_ITEM_NAME , "removediacritics" ); 
         $NEXT_ITEM_NAME    = recase( $NEXT_ITEM_NAME    , l );
         $NEXT_ITEM_NAME    = recase( $NEXT_ITEM_NAME    , "removediacritics" ); 
         //compare
         if ( $CURRENT_ITEM_NAME == $NEXT_ITEM_NAME )
            {
            //COUNTER for compare only two at time
            $COUNTER++ ;
            if ( $COUNTER > 1 ) 
               {
               //select by size
               $CURRENT_ITEM_SIZE = filesize( $CURRENT_ITEM );
               $NEXT_ITEM_SIZE    = filesize( $NEXT_ITEM );
               if ( $CURRENT_ITEM_SIZE > $NEXT_ITEM_SIZE ) 
                  {
                  $RESULT = $RESULT . "" . $CURRENT_ITEM . "<crlf>" ;
                  }
               }
            }
         }; 
      $COUNTER = 0 ;
      };
   //for text test $RESULT
   //text $RESULT ;
   selectitems $RESULT ;
   //if nothing dupes
   if ( $RESULT == "" )
      {
      status ("No Dupes" , , "select");
      }
I'm aware of the search tab of xyplorer (CTRL+F) and his dupes section, but I want do it by script because it can be only one click action...



Thank you so much !!!!!
Any help appreciated

highend
Posts: 14940
Joined: 06 Feb 2011 00:33
Location: Win Server 2022 @100%

Re: Automatically search for duplicate dupes files by scripts

Post by highend »

Code: Select all

    $files = report("{name}|{size raw}<crlf>", listfolder(, , 1, <crlf>));

    $toKeep = "";
    while ($files) {
        $file    = gettoken($files, 1, <crlf>);
        $base    = gpc(gettoken($file, 1, "|"), "base");
        $escaped = regexreplace($base, "([\\.+(){\[^$])", "\$1");
        // Match all files with the same base (and get their sizes)
        $matches = regexmatches($files, "^$escaped\.[^.]+?\|\d+", <crlf>);
        // Remove them from the $files list
        $files   = regexreplace($files, "^$escaped\.[^.]+?\|\d+(\r?\n|$)");

        // Rearrange and sort matches
        $sMatch  = formatlist(regexreplace($matches, "^(.+?)\|(\d+)", "$2|$1"), "s", <crlf>);
        $toKeep .= gettoken(gettoken($sMatch, 1, <crlf>), 2, "|") . <crlf>;
    }
    if ($toKeep) { selectitems $toKeep; }
    else { status "No dupes found...", "8B4513", "stop"; }
One of my scripts helped you out? Please donate via Paypal

hyperman
Posts: 3
Joined: 20 Aug 2020 08:08

Re: Automatically search for duplicate dupes files by scripts

Post by hyperman »

I tried your script: select the biggest files from each dupes, and all unique files, it works
(I still have a problem with a video, which is not recognized correclty by any script)
but with that I can improve my firsts scripts
Thank you

highend
Posts: 14940
Joined: 06 Feb 2011 00:33
Location: Win Server 2022 @100%

Re: Automatically search for duplicate dupes files by scripts

Post by highend »

I still have a problem with a video, which is not recognized correclty by any script
What problem is that?
One of my scripts helped you out? Please donate via Paypal

hyperman
Posts: 3
Joined: 20 Aug 2020 08:08

Re: Automatically search for duplicate dupes files by scripts

Post by hyperman »

I found it's because your script sort by
s = sort ascending (by default case-insensitive: A==a)
text formatlist("3b|10a|200c", "s"); //10a|200c|3b

2245996|test_1.mp3
29081082|test_1.avi
5981243|test_1.mp4

18238895|test_2.flv
27631976|test_2.avi
5120138|test_2.mp4
795569|test_2.mp3

19630426|test_3.mp4
2315392|test_3.mp3
237122090|test_3.avi
458348544|test_3.mpg
706085|test_3.flv

in that case particular case should be sort naturally
n = sort naturally (can only be case-insensitive!)
in the case "rn"

29081082|test_1.avi
5981243|test_1.mp4
2245996|test_1.mp3

27631976|test_2.avi
18238895|test_2.flv
5120138|test_2.mp4
795569|test_2.mp3

458348544|test_3.mpg
237122090|test_3.avi
19630426|test_3.mp4
2315392|test_3.mp3
706085|test_3.flv

thank you !

highend
Posts: 14940
Joined: 06 Feb 2011 00:33
Location: Win Server 2022 @100%

Re: Automatically search for duplicate dupes files by scripts

Post by highend »

You probably mean "n", not "rn" (you were asking for the smallest file, not the largest)...
One of my scripts helped you out? Please donate via Paypal

Post Reply