Similar filename matches in a directory

Please check the FAQ (https://www.xyplorer.com/faq.php) before posting a question...
Post Reply
hermhart
Posts: 110
Joined: 13 Jan 2015 18:41

Similar filename matches in a directory

Post by hermhart » 14 Feb 2019 01:59

If I have a set of filenames in a directory that are all formatted the same way (i.e.: {descriptor}{3 character code}{3 character code}~{remaining codes}.ext) of which I have broken down by regular expression already, is there a way with a script that will check each filename in the list against all the other filenames in the list to see if there are two or more matches to the first two groups of the regular expression and list them?

If it helps, the regular expression I am using is: ([0-9a-zA-Z_.#\-]*)([a-zA-Z0-9]{3})([0-9]{3})(~)([0-9a-z\-]*)(\.)([a-z]*)$

Thank you for any help!

highend
Posts: 7816
Joined: 06 Feb 2011 00:33

Re: Similar filename matches in a directory

Post by highend » 14 Feb 2019 07:53

And now post a real world example of file names...
One of my scripts helped you out? Please donate via Paypal or highend (at) web (dot) de

hermhart
Posts: 110
Joined: 13 Jan 2015 18:41

Re: Similar filename matches in a directory

Post by hermhart » 14 Feb 2019 18:25

A real world example would be:
12345678a05001~a.ext

So the regular expression should be able to group this as:
12345678 a05 001 ~ a .ext

The first group (12345678) could be longer or shorter than 8 characters and also contain letters.
The second group (a05) will always have 3 characters.
The third group (001) group will always have 3 characters.
A tilde (~) separator.
Then the last group before the extension could be alphanumeric characters.

So if I had 2 or more files that shared the same alphanumerics in the first two groups, it would note the two files. So the two filenames in the middle would get noted.

12345678a05001~a.ext
87654321a05002~b.ext
87654321a05003~c.ext
65432178b09000~b.ext

I hope I explained that enough to make some sense.

highend
Posts: 7816
Joined: 06 Feb 2011 00:33

Re: Similar filename matches in a directory

Post by highend » 14 Feb 2019 18:50

Code: Select all

    $files = listfolder(, , 1+4, <crlf>);
    $log = "";
    while ($files) {
        $id = regexreplace(gettoken($files, 1, <crlf>), "^([0-9a-zA-Z_.#-]*)([a-zA-Z0-9]{3})([0-9]{3})(.*)", "$1$2", 1);
        $escaped = regexreplace($id, "([\\.+(){\[^$])", "\$1");

        $matches = regexmatches($files, "^" . $escaped . ".*?(?=\r?\n|$)", <crlf>, 1);
        if (gettoken($matches, "count", <crlf>) >= 2) {
            $log .= $matches . <crlf 2> . strrepeat("-", 20) . <crlf 2>;
        }
        $files = formatlist(regexreplace($files, "^" . $escaped . ".*?(?=\r?\n|$)", , 1), "e", <crlf>);
    }
    if ($log) {
        text "Matching files...<crlf>" . strrepeat("=", 17) . <crlf 2> . $log;
    } else {
        text "No matches found!";
    }
One of my scripts helped you out? Please donate via Paypal or highend (at) web (dot) de

hermhart
Posts: 110
Joined: 13 Jan 2015 18:41

Re: Similar filename matches in a directory

Post by hermhart » 15 Feb 2019 19:03

highend,

I don't even know what to say except for amazing and thank you.

Just as an added bonus, if I had a certain set of three characters for the second grouping (i.e.: btr or imp), is there a way to exclude a set or two if needed? If not, I can very much work with what you have already done.

highend
Posts: 7816
Joined: 06 Feb 2011 00:33

Re: Similar filename matches in a directory

Post by highend » 15 Feb 2019 19:25

Add another check in the if (gettoken($matches, "count", <crlf>) >= 2) {
block that tests via regexmatches if the second group does NOT contain
any of the ignored patterns and only do the $log .= $matches . <crlf 2> . strrepeat("-", 20) . <crlf 2>;
stuff if that's true.
One of my scripts helped you out? Please donate via Paypal or highend (at) web (dot) de

hermhart
Posts: 110
Joined: 13 Jan 2015 18:41

Re: Similar filename matches in a directory

Post by hermhart » 15 Feb 2019 20:02

highend,

Thank you so much!

Post Reply