Duplicate audio file hash

Features wanted...
Post Reply
Twisten
Posts: 204
Joined: 27 Apr 2008 10:30

Duplicate audio file hash

Post by Twisten »

Similar in concept to the image hash, an old idea that can find new life here, its very simple, when searching for duplicate audio files ignore all meta data included in the file (e.g. id3 tags for mp3 files) hashing only the audio data, easily finding duplicates that you would otherwise miss when all that changed is some tag.

admin
Site Admin
Posts: 60357
Joined: 22 May 2004 16:48
Location: Win8.1 @100%, Win10 @100%
Contact:

Re: Duplicate audio file hash

Post by admin »

Easier said than done! I'm not aware of a way to extract the audio data.

mazot
Posts: 42
Joined: 20 Apr 2020 23:19

Re: Duplicate audio file hash

Post by mazot »

Found this online.
https://superuser.com/questions/1044413 ... ith-ffmpeg
wrote this as an example .

Code: Select all

"_Initialize"; perm $p_note3="<@Zporta>\Context\Context.exe"; perm $ScriptFile= self ("file");
	       perm $p_ffmpeg = "<@Zporta>\CamStudioPortable\App\CamStudio\ffmpeg.exe";
	      
"_Terminate"; unset $p_note3; unset $ScriptFile; unset $p_ffmpeg; 
"Edit script : edit"
	    run "$p_note3 $ScriptFile";
-
"check file"
            $SelectedItems = get("SelectedItemsNames","|");
        foreach($Item, $SelectedItems,"|") {
            $cmdline = <<<CMD_LINE
cmd /c "$p_ffmpeg -i "$Item" -vn -f md5 - 2>NUL"
CMD_LINE;
         $cmdline = Replace($cmdline, "<crlf>", ' ');
         $md5ans = Runret($cmdline, , 0, 0);
         writefile("C:\MyFiles\tester.txt", $md5ans, "a");
         }
when tested found anomalies between wavfiles and mp3's.

admin
Site Admin
Posts: 60357
Joined: 22 May 2004 16:48
Location: Win8.1 @100%, Win10 @100%
Contact:

Re: Duplicate audio file hash

Post by admin »

Cool, so that seems to work?

mazot
Posts: 42
Joined: 20 Apr 2020 23:19

Re: Duplicate audio file hash

Post by mazot »

On the whole it seems to work. I created a simple wave file, saved as mp3, saved also as a wav file.
MD5'd wav file then went into "Properties" and at bottom of "Details" found "Remove personal details".
This created copies which I md5'd this is where I found different results.
On mp3 files I found no problems. I certainly would be interested in any results from other members.
It would also be interesting concerning other formats.

RalphM
Posts: 1932
Joined: 27 Jan 2005 23:38
Location: Cairns, Australia

Re: Duplicate audio file hash

Post by RalphM »

While the music might be "the same" in the wav and mp3 files, I would be surprised if they ended up to have matching hash values.
If I save a Word document as rtf the two files have not the same hash either.
Ralph :)
(OS: W11 22H2 Home x64 - XY: Current beta - Office 2019 32-bit - Display: 1920x1080 @ 125%)

admin
Site Admin
Posts: 60357
Joined: 22 May 2004 16:48
Location: Win8.1 @100%, Win10 @100%
Contact:

Re: Duplicate audio file hash

Post by admin »

mp3 uses lossy compression, so the wave data cannot be the same as in the WAV file.

HumD
Posts: 9
Joined: 19 Sep 2021 08:52

Re: Duplicate audio file hash

Post by HumD »

Very good idea! I fill the AudioMD5 tag (m4a) through foobar2000 with the plugin, but there is no way to search for duplicates, only through Excel I found 6000 duplicate hash entries.
A tool for generating and verifying MD5 checksum of audio data. Uses ffmpeg.exe and supports all formats FFmpeg supports.
https://www.foobar2000.org/
https://foobar.hyv.fi/?view=foo_audiomd5

Post Reply