Page 3 of 3

Re: Command Number for "Copy Containing Folder(s)"?

Posted: 11 Sep 2014 23:08
by highend
A slightly changed version...

Tested on 26.547 files in 4.279 folders (SSD).

The last version needs 2350 msecs, the new one 1400, so a 40% increase in speed.
Most of it comes because of deriving the folder list of the $files variable instead of
processing a second folderreport().

The last version wasn't working correctly because it cut off parts of matching paths.
The new version adds a trailing pattern to avoid this.

Code: Select all

    $startingFolder = inputfolder("C:\", "Please select folder to search");
    $excludedFiles = input("Enter the file name(s) that should NOT be in any of the folders", "File names must include their extension but NOT the path! Separate all items with a pipe '|'. Wildcards are not allowed!");

    $files = folderreport("files", "r", $startingFolder, "r", , "<crlf>");
    // Derive folders from $files (faster than an extra folderreport)
    $folders = formatlist(regexmatches($files, "^.*(?=\\)", "<crlf>"), "dents", "<crlf>");

    $metaCharacters = "(\\|\*|\^|\$|\.|\+|\(|\)|\[|\{)";
    $escapedCharacters = "\$1";

    $excludedFiles = regexreplace($excludedFiles, $metaCharacters, $escapedCharacters);
    $matches = regexmatches($files, "^.*?(" . $excludedFiles . ")$", "<crlf>");

    if ($matches) {
        // Get everything in each line up to (but not including) the last backslash -> path component
        $pattern = regexreplace(formatlist(regexmatches($matches, "^.*(?=\\)", "|"), "dents"), $metaCharacters, $escapedCharacters);
        // To remove only full paths we have to add an additonal trailing pattern
        // It omits the need for the formerly used formatlist at the end as well
        $pattern = trim(regexreplace($pattern, "(\||$)", "(\r?\n|$)|"), "|", "R");
        $folders = regexreplace($folders, "($pattern)");
    }
    text $folders;

Re: Command Number for "Copy Containing Folder(s)"?

Posted: 15 Sep 2014 21:06
by Jeff Bellune
As long as we're going for speed, wouldn't a negative character class be much faster than the lazy quantifier? Like this:

Code: Select all

//Old command:
$matches = regexmatches($files, "^.*?(" . $excludedFiles . ")$", "<crlf>");
//New command:
$matches = regexmatches($files, "^.*[^\r\n](" . $excludedFiles . ")$", "<crlf>");
In my simple tests, it cuts out at least one-third of the engine's backtracking steps.

What do you think?

Jeff

Re: Command Number for "Copy Containing Folder(s)"?

Posted: 15 Sep 2014 21:14
by highend
Run some real life speedtest (20-30k files) on both of them and then show us the results :)
Don't know if it really matters considering how fast regexmatches() is.

Re: Command Number for "Copy Containing Folder(s)"?

Posted: 15 Sep 2014 21:28
by Jeff Bellune
highend wrote:Run some real life speedtest (20-30k files) on both of them and then show us the results :)
Is that your way of saying, "It doesn't matter."? :)

Re: Command Number for "Copy Containing Folder(s)"?

Posted: 16 Sep 2014 16:59
by Jeff Bellune
Working my way through this with RegexBuddy, I have to say that this section of code is brilliant:

Code: Select all

         // Get everything in each line up to (but not including) the last backslash -> path component
        $pattern = regexreplace(formatlist(regexmatches($matches, "^.*(?=\\)", "|"), "dents"), $metaCharacters, $escapedCharacters);
        // To remove only full paths we have to add an additonal trailing pattern
        // It omits the need for the formerly used formatlist at the end as well
        $pattern = trim(regexreplace($pattern, "(\||$)", "(\r?\n|$)|"), "|", "R");
        $folders = regexreplace($folders, "($pattern)");
From my reading it seems that "\r?\n" makes the carriage return optional. Is that correct, and if so, is that for Linux or other OS compatibility?

Jeff

Re: Command Number for "Copy Containing Folder(s)"?

Posted: 16 Sep 2014 17:05
by highend
Is that your way of saying, "It doesn't matter."?
Nope. It's my way to say: I don't know how much this affects the execution time of the command on a large base of files. If it's 10 miliseconds, who cares but if it's a few seconds...
it seems that "\r?\n" makes the carriage return optional. Is that correct, and if so, is that for Linux or other OS compatibility?
Windows text files use \r\n to terminate lines while UNIX text files use only \n. So by making the \r optional it will match any windows / linux line terminator.

Re: Command Number for "Copy Containing Folder(s)"?

Posted: 16 Sep 2014 18:09
by Jeff Bellune
highend wrote:
Is that your way of saying, "It doesn't matter."?
Nope. It's my way to say: I don't know how much this affects the execution time of the command on a large base of files. If it's 10 miliseconds, who cares but if it's a few seconds...
it seems that "\r?\n" makes the carriage return optional. Is that correct, and if so, is that for Linux or other OS compatibility?
Windows text files use \r\n to terminate lines while UNIX text files use only \n. So by making the \r optional it will match any windows / linux line terminator.
Regarding the use of "?" versus "[^\r\n}": On a set of 7,900 folders containing 28,000 files, the difference is about 0.5 seconds when only a single file is listed in the excluded files list. (NB: the single file is found in many of the test folders.) As the excluded files list grows to 10-12 files, the time difference is essentially nil. I assume that's because more excluded files means more folders will be excluded, and the time to replace, format, trim and remove folders from the list far exceeds the time to process the string of excluded folders. I can't think of any other reason for the time difference to shrink as more excluded files are input by the user.

Thanks for confirming my hypothesis about OS compatibility.

Cheers,
Jeff