Duplicate audio file hash
Duplicate audio file hash
Similar in concept to the image hash, an old idea that can find new life here, its very simple, when searching for duplicate audio files ignore all meta data included in the file (e.g. id3 tags for mp3 files) hashing only the audio data, easily finding duplicates that you would otherwise miss when all that changed is some tag.
-
- Site Admin
- Posts: 60602
- Joined: 22 May 2004 16:48
- Location: Win8.1 @100%, Win10 @100%
- Contact:
Re: Duplicate audio file hash
Easier said than done! I'm not aware of a way to extract the audio data.
FAQ | XY News RSS | XY Twitter
Re: Duplicate audio file hash
Found this online.
https://superuser.com/questions/1044413 ... ith-ffmpeg
wrote this as an example .
when tested found anomalies between wavfiles and mp3's.
https://superuser.com/questions/1044413 ... ith-ffmpeg
wrote this as an example .
Code: Select all
"_Initialize"; perm $p_note3="<@Zporta>\Context\Context.exe"; perm $ScriptFile= self ("file");
perm $p_ffmpeg = "<@Zporta>\CamStudioPortable\App\CamStudio\ffmpeg.exe";
"_Terminate"; unset $p_note3; unset $ScriptFile; unset $p_ffmpeg;
"Edit script : edit"
run "$p_note3 $ScriptFile";
-
"check file"
$SelectedItems = get("SelectedItemsNames","|");
foreach($Item, $SelectedItems,"|") {
$cmdline = <<<CMD_LINE
cmd /c "$p_ffmpeg -i "$Item" -vn -f md5 - 2>NUL"
CMD_LINE;
$cmdline = Replace($cmdline, "<crlf>", ' ');
$md5ans = Runret($cmdline, , 0, 0);
writefile("C:\MyFiles\tester.txt", $md5ans, "a");
}
Re: Duplicate audio file hash
On the whole it seems to work. I created a simple wave file, saved as mp3, saved also as a wav file.
MD5'd wav file then went into "Properties" and at bottom of "Details" found "Remove personal details".
This created copies which I md5'd this is where I found different results.
On mp3 files I found no problems. I certainly would be interested in any results from other members.
It would also be interesting concerning other formats.
MD5'd wav file then went into "Properties" and at bottom of "Details" found "Remove personal details".
This created copies which I md5'd this is where I found different results.
On mp3 files I found no problems. I certainly would be interested in any results from other members.
It would also be interesting concerning other formats.
Re: Duplicate audio file hash
While the music might be "the same" in the wav and mp3 files, I would be surprised if they ended up to have matching hash values.
If I save a Word document as rtf the two files have not the same hash either.
If I save a Word document as rtf the two files have not the same hash either.
Ralph
(OS: W11 22H2 Home x64 - XY: Current beta - Office 2019 32-bit - Display: 1920x1080 @ 125%)
(OS: W11 22H2 Home x64 - XY: Current beta - Office 2019 32-bit - Display: 1920x1080 @ 125%)
-
- Site Admin
- Posts: 60602
- Joined: 22 May 2004 16:48
- Location: Win8.1 @100%, Win10 @100%
- Contact:
Re: Duplicate audio file hash
mp3 uses lossy compression, so the wave data cannot be the same as in the WAV file.
FAQ | XY News RSS | XY Twitter
Re: Duplicate audio file hash
Very good idea! I fill the AudioMD5 tag (m4a) through foobar2000 with the plugin, but there is no way to search for duplicates, only through Excel I found 6000 duplicate hash entries.
https://foobar.hyv.fi/?view=foo_audiomd5
https://www.foobar2000.org/A tool for generating and verifying MD5 checksum of audio data. Uses ffmpeg.exe and supports all formats FFmpeg supports.
https://foobar.hyv.fi/?view=foo_audiomd5