Post that script and attach the source data zipped to it in a new thread. I bet that can be done faster xDWell... it seems to be not exactly a "new" feature but I'd like to say thanks anyway: gettokenindex saved my bacon today as I needed a custom way to count ~19k entries. Built a script in less than 20 min, took seconds to process the input. Cheers!
Tiagos thread
Tiagos thread
[moved from "Like a new feature? Say thanks here"]
One of my scripts helped you out? Please donate via Paypal
Re: Like a new feature? Say thanks here
What about those files?
- Attachments
-
- output.zip
- (3.95 KiB) Downloaded 98 times
-
- ridiculous input.zip
- (8.2 KiB) Downloaded 102 times
Power-hungry user!!!
Re: Like a new feature? Say thanks here
1. To a new thread. This doesn't belong here
2. And where is the code that produces the list?
3. The time in milliseconds how long it took?
2. And where is the code that produces the list?
3. The time in milliseconds how long it took?
One of my scripts helped you out? Please donate via Paypal
Code Improvements
Using the inputs posted here http://www.xyplorer.com/xyfc/viewtopic. ... 63#p138363
highend suggested he could improve execution times a bit. He can!, as his solution goes with 1313 ms against fabulous
22547 msecs
o'mine.
Trying to integrate a blacklist on it breaks the code as regexmatches, another suggestion from him, warns on errors no matter what I do.
So please, highend, put me out of my misery!
highend suggested he could improve execution times a bit. He can!, as his solution goes with 1313 ms against fabulous
22547 msecs
o'mine.
Trying to integrate a blacklist on it breaks the code as regexmatches, another suggestion from him, warns on errors no matter what I do.
So please, highend, put me out of my misery!
Power-hungry user!!!
Re: Code Improvements
And your script is... where? oO
One of my scripts helped you out? Please donate via Paypal
Re: Tiagos thread
...in the trash bin...?
I was so disappointed by my results that I threw it away, highend. May recall the dumb routines I used but it would be a total shame posting that here. Please spare me.
I was so disappointed by my results that I threw it away, highend. May recall the dumb routines I used but it would be a total shame posting that here. Please spare me.
Power-hungry user!!!
Re: Tiagos thread
Nope. To see what can be done better (with limited knowledge) the script is required...
One of my scripts helped you out? Please donate via Paypal
Re: Tiagos thread
You're mean...
Code: Select all
$blklist = "oinutter.wav,oops.wav,justyouwait.wav"; // blacklist of common words, comma (",") separated
//you can just comment it so no blacklist will be applied
$out = "";
$out2 = "";
$input = formatlist(<clipboard>, se, <crlf>);
$inputB = formatlist($input, sde, <crlf>);
foreach($tk, "$inputB", <crlf>) {
$check = regexmatches("$blklist", "$tk");
if($check != "") { continue; }
$count = gettokenindex($tk, "$input", <crlf>, ic);
$out = $out . $count . " = " . $tk . <crlf>;
$out2 = $out2 . $tk . " = " . $count . <crlf>;
}
$out = formatlist($out, rsn, <crlf>);
text $out;
text $out2;
Power-hungry user!!!
Re: Tiagos thread
1. You still quote variables that don't need any quotation
E.g.:
Quote them when necessary, e.g. when they are inside a string
2. Be careful what you're doing here for the blacklist check!
A $tk contains a single dot (before the extension). In regex language this isn't a dot, it's a metacharacter that can be a single "everything".
For the data you're working with it doesn't matter that much but it can. So you'd better escape your regex search patterns.
3. You don't need to do outputs two times. Just transform one output into the other by using a regexreplace on it (with two capturing groups)
3. Apart from that it's no wonder that it's slow. E.g. 19k entries = 19k loop runs and loops in XY aren't the fastest.
This is an optimized version that uses a technique to reduce the number of loops (which some people don't like) leading to a significant faster runtime. E.g. your input file has 18358 entries and your loop runs through all of them. Mine needs only 281. Doing a regex lookup for a blacklist -> put it in by yourself
E.g.:
Code: Select all
$check = regexmatches("$blklist", "$tk");
2. Be careful what you're doing here for the blacklist check!
A $tk contains a single dot (before the extension). In regex language this isn't a dot, it's a metacharacter that can be a single "everything".
For the data you're working with it doesn't matter that much but it can. So you'd better escape your regex search patterns.
3. You don't need to do outputs two times. Just transform one output into the other by using a regexreplace on it (with two capturing groups)
3. Apart from that it's no wonder that it's slow. E.g. 19k entries = 19k loop runs and loops in XY aren't the fastest.
This is an optimized version that uses a technique to reduce the number of loops (which some people don't like) leading to a significant faster runtime. E.g. your input file has 18358 entries and your loop runs through all of them. Mine needs only 281. Doing a regex lookup for a blacklist -> put it in by yourself
Code: Select all
$file = "D:\Users\Highend\Downloads\ridiculous input\ridiculous input.txt";
$content = formatlist(readfile($file), "e", <crlf>) . <crlf>;
$count = gettoken($content, "count", <crlf>);
$result = "";
while ($i++ < $count) {
$line = gettoken($content, 1, <crlf>);
if !($line) { break; }
$pattern = regexEscape($line);
$result = $result . $line . " = " . gettoken(regexmatches($content, $pattern), "count", "|") . <crlf>;
$content = regexreplace($content, "^$pattern\r?\n");
}
text "Runs: $i" . <crlf> . formatlist($result, "se", <crlf>);
function regexEscape($string) {
return regexreplace($string, "([\\^$.+*|?(){\[])", "\$1");
}
One of my scripts helped you out? Please donate via Paypal
Re: Tiagos thread
Hello highend. I have read here and there about being safe practice to put quotes around anything, which is not true when it comes to foreach loops. Inversely I don't put quotes on parameters like 'count' for gettoken as you do. I'm not a specialist as you can see. Any special reason there?
Yes I noticed the dot problem on some other actual data inputs, will make some changes by your instructions, thanks.
Those two outputs are needed. Will check your code if I can strip the first back into action. As well as check it against your notes on optimization. Cheers!
Yes I noticed the dot problem on some other actual data inputs, will make some changes by your instructions, thanks.
Those two outputs are needed. Will check your code if I can strip the first back into action. As well as check it against your notes on optimization. Cheers!
Power-hungry user!!!
Re: Tiagos thread
This has nothing to do with foreach loopswhich is not true when it comes to foreach loops
You should. Look up the command in the help file. The examples use quotes, too.I don't put quotes on parameters like 'count' for gettoken
Though it's currently not enforced by XY's scripting engine...
I know. I didn't say: Don't use them. But repeating the adding to the output two times is 18k+ more lines to execute instead of one line after the loop has finishedThose two outputs are needed
One of my scripts helped you out? Please donate via Paypal
Re: Tiagos thread
I'd express it in another way:
- if you put a simple variable inside quotes that means to quote it - and makes no sense
- i.e.: quoting a variable makes no sense (as highend says)
- BUT: you are allowed to concatenate a variable to a (double!) quoted string by just putting it into the quotes.
Example:
Echo "result is: " . $result; // basic syntax, variable unquoted
Echo "result is: $result"; // 'enhanced': $result inside the double quotes - no concat needed (it's implied)
Echo "result is: " . "$result"; // This works, but makes NO SENSE at all!
A situation where you *must* do some concatenation:
Echo 'Value of $x1 is: ' . $x1;
-> For $x1 to be written as is it *must* be in single quotes - so you *must* specify the 2nd $x1 outside of it. Of course: unquoted!
If you explain "$result" *very* exact, it means:
- the double quotes specify a *string*
- the string contains a variable name - that's resolved and then concatenated to the rest of the string
- the rest of the string is nothing - so you request to concatenate the contents of the var to nothing
- it's about the same as: "".$result."" i.e. concatenate the var to 2 strings containing nothing
- if you put a simple variable inside quotes that means to quote it - and makes no sense
- i.e.: quoting a variable makes no sense (as highend says)
- BUT: you are allowed to concatenate a variable to a (double!) quoted string by just putting it into the quotes.
Example:
Echo "result is: " . $result; // basic syntax, variable unquoted
Echo "result is: $result"; // 'enhanced': $result inside the double quotes - no concat needed (it's implied)
Echo "result is: " . "$result"; // This works, but makes NO SENSE at all!
A situation where you *must* do some concatenation:
Echo 'Value of $x1 is: ' . $x1;
-> For $x1 to be written as is it *must* be in single quotes - so you *must* specify the 2nd $x1 outside of it. Of course: unquoted!
If you explain "$result" *very* exact, it means:
- the double quotes specify a *string*
- the string contains a variable name - that's resolved and then concatenated to the rest of the string
- the rest of the string is nothing - so you request to concatenate the contents of the var to nothing
- it's about the same as: "".$result."" i.e. concatenate the var to 2 strings containing nothing
Win11 Pro 223H2 Gerrman