Page 1 of 1
Duplicate audio file hash
Posted: 07 Feb 2021 12:05
by Twisten
Similar in concept to the image hash, an old idea that can find new life here, its very simple, when searching for duplicate audio files ignore all meta data included in the file (e.g. id3 tags for mp3 files) hashing only the audio data, easily finding duplicates that you would otherwise miss when all that changed is some tag.
Re: Duplicate audio file hash
Posted: 08 Feb 2021 11:12
by admin
Easier said than done! I'm not aware of a way to extract the audio data.
Re: Duplicate audio file hash
Posted: 09 Feb 2021 21:25
by mazot
Found this online.
https://superuser.com/questions/1044413 ... ith-ffmpeg
wrote this as an example .
Code: Select all
"_Initialize"; perm $p_note3="<@Zporta>\Context\Context.exe"; perm $ScriptFile= self ("file");
perm $p_ffmpeg = "<@Zporta>\CamStudioPortable\App\CamStudio\ffmpeg.exe";
"_Terminate"; unset $p_note3; unset $ScriptFile; unset $p_ffmpeg;
"Edit script : edit"
run "$p_note3 $ScriptFile";
-
"check file"
$SelectedItems = get("SelectedItemsNames","|");
foreach($Item, $SelectedItems,"|") {
$cmdline = <<<CMD_LINE
cmd /c "$p_ffmpeg -i "$Item" -vn -f md5 - 2>NUL"
CMD_LINE;
$cmdline = Replace($cmdline, "<crlf>", ' ');
$md5ans = Runret($cmdline, , 0, 0);
writefile("C:\MyFiles\tester.txt", $md5ans, "a");
}
when tested found anomalies between wavfiles and mp3's.
Re: Duplicate audio file hash
Posted: 10 Feb 2021 12:05
by admin
Cool, so that seems to work?
Re: Duplicate audio file hash
Posted: 10 Feb 2021 17:41
by mazot
On the whole it seems to work. I created a simple wave file, saved as mp3, saved also as a wav file.
MD5'd wav file then went into "Properties" and at bottom of "Details" found "Remove personal details".
This created copies which I md5'd this is where I found different results.
On mp3 files I found no problems. I certainly would be interested in any results from other members.
It would also be interesting concerning other formats.
Re: Duplicate audio file hash
Posted: 11 Feb 2021 03:53
by RalphM
While the music might be "the same" in the wav and mp3 files, I would be surprised if they ended up to have matching hash values.
If I save a Word document as rtf the two files have not the same hash either.
Re: Duplicate audio file hash
Posted: 11 Feb 2021 10:20
by admin
mp3 uses lossy compression, so the wave data cannot be the same as in the WAV file.
Re: Duplicate audio file hash
Posted: 25 Sep 2021 15:48
by HumD
Very good idea! I fill the AudioMD5 tag (m4a) through foobar2000 with the plugin, but there is no way to search for duplicates, only through Excel I found 6000 duplicate hash entries.
A tool for generating and verifying MD5 checksum of audio data. Uses ffmpeg.exe and supports all formats FFmpeg supports.
https://www.foobar2000.org/
https://foobar.hyv.fi/?view=foo_audiomd5