Tiagos thread

Please check the FAQ (https://www.xyplorer.com/faq.php) before posting a question...
Post Reply
highend
Posts: 13311
Joined: 06 Feb 2011 00:33

Tiagos thread

Post by highend »

[moved from "Like a new feature? Say thanks here"]
Well... it seems to be not exactly a "new" feature but I'd like to say thanks anyway: gettokenindex saved my bacon today as I needed a custom way to count ~19k entries. Built a script in less than 20 min, took seconds to process the input. Cheers! :beer:
Post that script and attach the source data zipped to it in a new thread. I bet that can be done faster xD
One of my scripts helped you out? Please donate via Paypal

tiago
Posts: 589
Joined: 14 Feb 2011 21:41

Re: Like a new feature? Say thanks here

Post by tiago »

What about those files?
Attachments
output.zip
(3.95 KiB) Downloaded 98 times
ridiculous input.zip
(8.2 KiB) Downloaded 102 times
Power-hungry user!!!

highend
Posts: 13311
Joined: 06 Feb 2011 00:33

Re: Like a new feature? Say thanks here

Post by highend »

1. To a new thread. This doesn't belong here
2. And where is the code that produces the list?
3. The time in milliseconds how long it took?
One of my scripts helped you out? Please donate via Paypal

tiago
Posts: 589
Joined: 14 Feb 2011 21:41

Code Improvements

Post by tiago »

Using the inputs posted here http://www.xyplorer.com/xyfc/viewtopic. ... 63#p138363
highend suggested he could improve execution times a bit. He can!, as his solution goes with 1313 ms against fabulous

22547 msecs

o'mine.

Trying to integrate a blacklist on it breaks the code as regexmatches, another suggestion from him, warns on errors no matter what I do.

So please, highend, put me out of my misery! :blackstorm:


:mrgreen:
Power-hungry user!!!

highend
Posts: 13311
Joined: 06 Feb 2011 00:33

Re: Code Improvements

Post by highend »

And your script is... where? oO
One of my scripts helped you out? Please donate via Paypal

tiago
Posts: 589
Joined: 14 Feb 2011 21:41

Re: Tiagos thread

Post by tiago »

...in the trash bin...? :roll:

I was so disappointed by my results that I threw it away, highend. May recall the dumb routines I used but it would be a total shame posting that here. Please spare me. :whistle:
Power-hungry user!!!

highend
Posts: 13311
Joined: 06 Feb 2011 00:33

Re: Tiagos thread

Post by highend »

Nope. To see what can be done better (with limited knowledge) the script is required...
One of my scripts helped you out? Please donate via Paypal

tiago
Posts: 589
Joined: 14 Feb 2011 21:41

Re: Tiagos thread

Post by tiago »

>sigh<

Ok, I'll do it again and post it later.
Power-hungry user!!!

highend
Posts: 13311
Joined: 06 Feb 2011 00:33

Re: Tiagos thread

Post by highend »

And, where is it? :)
One of my scripts helped you out? Please donate via Paypal

tiago
Posts: 589
Joined: 14 Feb 2011 21:41

Re: Tiagos thread

Post by tiago »

You're mean...

:mrgreen:

Code: Select all

 $blklist = "oinutter.wav,oops.wav,justyouwait.wav"; // blacklist of common words, comma (",") separated
//you can just comment it so no blacklist will be applied

 $out = "";
 $out2 = "";

 $input = formatlist(<clipboard>, se, <crlf>);
 $inputB = formatlist($input, sde, <crlf>);
 foreach($tk, "$inputB", <crlf>) {
 $check = regexmatches("$blklist", "$tk");
 if($check != "") { continue; }

 $count = gettokenindex($tk, "$input", <crlf>, ic);
 $out = $out . $count . " = " . $tk . <crlf>; 
 $out2 = $out2 . $tk . " = " . $count . <crlf>; 

          }

 $out = formatlist($out, rsn, <crlf>);

 text $out;
 text $out2;
Power-hungry user!!!

highend
Posts: 13311
Joined: 06 Feb 2011 00:33

Re: Tiagos thread

Post by highend »

1. You still quote variables that don't need any quotation
E.g.:

Code: Select all

 $check = regexmatches("$blklist", "$tk");
Quote them when necessary, e.g. when they are inside a string

2. Be careful what you're doing here for the blacklist check!
A $tk contains a single dot (before the extension). In regex language this isn't a dot, it's a metacharacter that can be a single "everything".
For the data you're working with it doesn't matter that much but it can. So you'd better escape your regex search patterns.

3. You don't need to do outputs two times. Just transform one output into the other by using a regexreplace on it (with two capturing groups)

3. Apart from that it's no wonder that it's slow. E.g. 19k entries = 19k loop runs and loops in XY aren't the fastest.

This is an optimized version that uses a technique to reduce the number of loops (which some people don't like) leading to a significant faster runtime. E.g. your input file has 18358 entries and your loop runs through all of them. Mine needs only 281. Doing a regex lookup for a blacklist -> put it in by yourself

Code: Select all

    $file = "D:\Users\Highend\Downloads\ridiculous input\ridiculous input.txt";
    $content = formatlist(readfile($file), "e", <crlf>) . <crlf>;
    $count = gettoken($content, "count", <crlf>);

    $result = "";
    while ($i++ < $count) {
        $line = gettoken($content, 1, <crlf>);
        if !($line) { break; }
        $pattern = regexEscape($line);
        $result = $result . $line . " = " . gettoken(regexmatches($content, $pattern), "count", "|") . <crlf>;
        $content = regexreplace($content, "^$pattern\r?\n");
    }
    text "Runs: $i" . <crlf> . formatlist($result, "se", <crlf>);

function regexEscape($string) {
    return regexreplace($string, "([\\^$.+*|?(){\[])", "\$1");
}
One of my scripts helped you out? Please donate via Paypal

tiago
Posts: 589
Joined: 14 Feb 2011 21:41

Re: Tiagos thread

Post by tiago »

Hello highend. I have read here and there about being safe practice to put quotes around anything, which is not true when it comes to foreach loops. Inversely I don't put quotes on parameters like 'count' for gettoken as you do. I'm not a specialist as you can see. Any special reason there?

Yes I noticed the dot problem on some other actual data inputs, will make some changes by your instructions, thanks.

Those two outputs are needed. Will check your code if I can strip the first back into action. As well as check it against your notes on optimization. Cheers! :beer:
Power-hungry user!!!

highend
Posts: 13311
Joined: 06 Feb 2011 00:33

Re: Tiagos thread

Post by highend »

which is not true when it comes to foreach loops
This has nothing to do with foreach loops
I don't put quotes on parameters like 'count' for gettoken
You should. Look up the command in the help file. The examples use quotes, too.
Though it's currently not enforced by XY's scripting engine...
Those two outputs are needed
I know. I didn't say: Don't use them. But repeating the adding to the output two times is 18k+ more lines to execute instead of one line after the loop has finished
One of my scripts helped you out? Please donate via Paypal

PeterH
Posts: 2785
Joined: 21 Nov 2005 20:39
Location: Germany

Re: Tiagos thread

Post by PeterH »

I'd express it in another way:
- if you put a simple variable inside quotes that means to quote it - and makes no sense
- i.e.: quoting a variable makes no sense (as highend says)
- BUT: you are allowed to concatenate a variable to a (double!) quoted string by just putting it into the quotes.

Example:
Echo "result is: " . $result; // basic syntax, variable unquoted
Echo "result is: $result"; // 'enhanced': $result inside the double quotes - no concat needed (it's implied)
Echo "result is: " . "$result"; // This works, but makes NO SENSE at all!

A situation where you *must* do some concatenation:
Echo 'Value of $x1 is: ' . $x1;
-> For $x1 to be written as is it *must* be in single quotes - so you *must* specify the 2nd $x1 outside of it. Of course: unquoted!

If you explain "$result" *very* exact, it means:
- the double quotes specify a *string*
- the string contains a variable name - that's resolved and then concatenated to the rest of the string
- the rest of the string is nothing - so you request to concatenate the contents of the var to nothing
- it's about the same as: "".$result."" i.e. concatenate the var to 2 strings containing nothing
Win11 Pro 223H2 Gerrman

Post Reply