Duplicate Image Finder improvements

Features wanted...
Post Reply
reyaz
Posts: 21
Joined: 29 Feb 2016 17:15

Duplicate Image Finder improvements

Post by reyaz »

I needed to find dupes in collections of images that have only slight differences in RGB values, and couldn't do it properly with how duplicate finder is working now. At tolerance = 1 there are already many false positive cases.

Seeing the "image hash" having this size, I'm thinking it's just not enough for high resolution cases - that's why it's almost useless for digital images that have small edits in them.

I have 2 suggestions:
1. Change the way how image hash is generated, and allow user to configure how it works. Depending on how it works now, it might be possible to scale it by just increasing the length of the hash string (it's only 8 bytes now as far as I can see).
2. Add a new method of handling tolerance - with an RGB value as input. For example, user sets the tolerance to (5,5,5) - it would mean that 2 pixels with RGB values (20,30,40) and (23,30,39) will be treated as equal, but (20,30,40) and (26,30,40) as different. This would be useful in my recent case - where one collection of images is taken from a website which automatically optimizes any image on upload, and second collection supposedly has the same images but before such optimizations took place. Perhaps such method would also help to locate the same image in worse quality, e.g. if you have one in PNG and another one as JPEG which was saved at 80% quality - the input tolerance value may be tweaked as desired to find such cases.
2.1. One more thing may work well with the RGB diff method described above - the number of pixels to tolerate (a percentage of full ). E.g. a dupe of high resolution image may still contain a few pixels that would not be tolerated by the initial RGB value - and user may select to tolerate, say, 50 pixels per image for his current task with 1024x1024 images. Logically it would apply the second RGB tolerance filter of value (254,254,254) on first 50 pixels that were not tolerated by the first filter set by a user. Also possible to make this second value configurable by user.
2.2. I'm not sure if using RGBA instead of RGB would make much sense, but it should be possible too, though the transparency value may need to be set either 0.00~1.00 or 0%~100% instead of 0~255.

RalphM
Posts: 1937
Joined: 27 Jan 2005 23:38
Location: Cairns, Australia

Re: Duplicate Image Finder improvements

Post by RalphM »

Sounds like a very specific wish that seems to be a bit out of the league for a file manager.
Out of curiosity, is there any professional photo app doing such a detailed comparison?

The dupe finder might get awfully slow if it has to compare the RGB values for all the pixels in an image and then keep count of how many actually differ by how much.
Ralph :)
(OS: W11 22H2 Home x64 - XY: Current beta - Office 2019 32-bit - Display: 1920x1080 @ 125%)

reyaz
Posts: 21
Joined: 29 Feb 2016 17:15

Re: Duplicate Image Finder improvements

Post by reyaz »

Out of curiosity, is there any professional photo app doing such a detailed comparison?
I don't think there is, but I was mainly trying general file comparison applications, not photo apps.
The dupe finder might get awfully slow if it has to compare the RGB values for all the pixels in an image and then keep count of how many actually differ by how much.
Well doesn't it already compare the RGB values for all the pixels at tolerance = 0?

admin
Site Admin
Posts: 60691
Joined: 22 May 2004 16:48
Location: Win8.1 @100%, Win10 @100%
Contact:

Re: Duplicate Image Finder improvements

Post by admin »

Currently enhancements of the Duplicate Image Finder are not planned. This is a very complex topic and I currently have too many other things on the table. :|

Nevertheless, the case is interesting and I want to solve it. Just need to find the time.

Post Reply