Page 2 of 3
Re: [discussion]Better regex engine for XYplorer
Posted: 03 Aug 2015 17:47
by highend
I could possibly help as well (but I guess Marco's will be sufficient

)
Re: [discussion]Better regex engine for XYplorer
Posted: 03 Aug 2015 21:44
by bdeshi
thanks!
I'll try to push my progress online soon.
Away from useable imternet for a few days, this might go sleepy...
I had been struggling with the problem of synchronizing message sending and receiving, so that both party waits until the other is ready to continue.
I've since had a shiny idea, concerning permavar-dependant infinite while loops! Won't impress anybody with it's beauty, but works for the time being (pcrematch is basically done), Not slow either, not noticeably at least.
@Marco, you're right about the control flow,
except not every | has to be escaped, but only when they match the separator string exactly. [slightly faster]
There's another reason the separator has to be at least two characters: so that gettoken can retrieve complete tokens from the return. Else even an escaped \| will be considered as a separator.
pcrematch
matches the pattern and returns a $sep separated matchlist, where each match is escaped for the separator. Returnmatch is a match-index; if it's defined, only a single match is returned, unescaped
pcrecapture is supposed to return captured groups
Code: Select all
text pcrecapture("abc[xyz]<crlf>[1||2||3]", '(?mi)^(\w)*(\[.*?\])', 1, '||'); // abc
text pcrecapture("abc[xyz]<crlf>[1||2]", '(?mi)^(\w)*(\[.*?\])', 2, '||'); // [xyz]||[1]\|\|2
pcresplit splits a string at pattern matches. Got the idea from php.
text pcresplit('abcd,efgh.ijkl', '[,\.]'); //abcd||efgh||ijkl
at this point I'd like to say, I felt extremely happy, typing pcre syntax in XY! :biggreen:
Re: [discussion]Better regex engine for XYplorer
Posted: 03 Aug 2015 23:52
by Marco
I had no problems of timing with my proof of concept, but maybe because I tested very simple patterns.
Anyway, the road to go should be asking Don to implement a mode 3 to copydata, ie. send and wait till a reply is received. Infinite while-loops might be CPU intensive for nothing.
Re the control flow. Better
Code: Select all
Input: string, regex, separator
1. Obtain an array of matches
2. "Replace/escape" all the 'separator' with '\separator' in the matches while they're still contained in the array
3. Flatten the array using 'separatorseparator' as separator between elements
Re: [discussion]Better regex engine for XYplorer
Posted: 04 Aug 2015 09:32
by bdeshi
another problem is to gracefully stop the udf loop if xypcre hangs or crashes.
Re: [discussion]Better regex engine for XYplorer
Posted: 04 Aug 2015 11:00
by bdeshi
This is the pre-finalized format
Code: Select all
pcrematch($string, $pattern, $sep='||', $returnmatch, $unescaped=0)
string string to search in (haystack)
pattern The RegExp pattern to search for in string (needle).
All PCRE syntax is supported. Options can be defined
using PCRE syntax, eg, (?mi).*pattern.*
separator Separator between matches in the returned list. This
must be at least two characters long. If only one
character is given, it's silently doubled.
Defaults to "||".
returnmatch 1-based index of only one match to return. If this is
greater than total matchcount, the last one is returned.
Ineffective if less than 1.
unescaped Turns off separator escaping in returned match(es)
This is useful when it's known that the source string
does not contain the separator string. (eg, it's a single
line string, and separator is given as <crlf>)
By default, each character in the separator is escaped in returned matches
as \s, \e, \p...
So that a gettoken() on the return can retrieve matches in whole.
As a result, the retrieved token has to be run through a replacement command:
replacelist('retrieved match', '\s,\e,\p', 's,e,p',',')
All good?
the unescaped parameter and escaping rules will also be used in other functions that return tokenized strings.
Re: [discussion]Better regex engine for XYplorer
Posted: 04 Aug 2015 19:28
by bdeshi
here's the latest draft.
not compiled. the xyi and au3 is expected to be in "<xyscripts>\inc\xypcre\"
only basic matching is implemented. escaping is not.
WARNING: looking at code may induce nausea and/or a feeling of hostility towards author.
WARNING 2: work in progress. not ready for use.
[attachment=0]xypcre.7z.xys[/attachment]
Re: [discussion]Better regex engine for XYplorer
Posted: 08 Aug 2015 10:26
by bdeshi
Added replace and group capture functions. Made a change to pcrematch so it returns global pattern matches.
Also changed the escaping scheme to square bracket enclosing [|]. Because if a token ended with | then after escaping it'd become "token\|||" and a gettoken would return only up to the \
(I know, it's unlike any regular scripting convention. How about a specialized gettoken mirror called pcretoken()?
[attachment=0]xypcre.7z.xys[/attachment]
Re: [discussion]Better regex engine for XYplorer
Posted: 08 Aug 2015 15:55
by Papoulka
Not to teach you guys to suck eggs, but this is on-topic and could help other newbies like me...
I lamented the lack of lookbehind because I often want to find strings that don't match a regex pattern. I finally realized that XY can do great things in that regard using "Invert" and especially Boolean Regex. IMVHO these make XY much more powerful for pure matching than any single regex engine could be. Even if not, these features are far easier to use than creating standard regexs for the same tasks.
Of course this doesn't address arrays, or replacements, and perhaps many other modern features. But it greatly increases the utility of the engine we have.
Re: [discussion]Better regex engine for XYplorer
Posted: 08 Aug 2015 16:55
by bdeshi
that's an advantage from the scripting side, rather than the regex. As a result, this can be achieved using any other regex engine.
Re: [discussion]Better regex engine for XYplorer
Posted: 08 Aug 2015 19:41
by Papoulka
I was referring to plain "File Find", and what it can do without resorting to scripting.
In fact I would like to know how to use eg. boolean regex to find files via a script. I know there is a way but have been thinking it's too difficult for me. Meaning that the time it would take me to learn / relearn it would be >> more than it would save. Not to further hijack this thread - I already have one:
http://www.xyplorer.com/xyfc/viewtopic.php?f=5&t=14404 and welcome any tips there.
Re: [discussion]Better regex engine for XYplorer
Posted: 09 Aug 2015 21:01
by bdeshi
back to xypcre:
I've added a pcretoken() (~gettoken) function to sidestep much of that escape/unescape conundrum.
Now each function that can return multiple tokens have a $format param (in place of $unesc)
format
0: just return matches separated by separator (equiv to $unesc=1)
1: matches are escaped against separator chars (equiv to $unesc=0)
2: return in this format:
token1length,token2length|token1token2
(this is not a bitfield -- 3 is not a valid choice.)
$separator is irrelevant when format=2
pcretoken($tokenlist, token, separator, format)
returns one match. Takes care of unescaping tokens.
So how does it sound?
This is an idea to make it all cleaner and more efficient.
the usual gettoken () way is still possible of course.
If all goes well, this project might be ready for testing and even more feedback by tomorrow! yay..
Re: [discussion]Better regex engine for XYplorer
Posted: 10 Aug 2015 11:36
by bdeshi
here's the "semi-final", USABLE edition!
contains xyi and compiled exe.
Please test and help fix bugs or add/change features!
This topic and the xyi contains much explanation and usage notes.
functions that return multiple matches can return data on a particular format. Use pcretoken() to get one match from this return.
[attachment=0]xypcre.7z.xys[/attachment]
Re: [discussion]Better regex engine for XYplorer
Posted: 11 Aug 2015 16:42
by bdeshi
almost forgot the splitting function!
Code: Select all
/*pcresplit()
Splits a string into substrings at each position
where a regexp pattern matches.
Returns substrings in defined format.
$string String to work on.
$pattern The RegExp pattern to match.
The portion that matches is removed.
Part or the pattern can be skipped:
(?<=pre)pattern(?=post)
$sep same as in pcrematch()
$format same as in pcrematch()
Notes: see notes of pcrematch()
*/
[attachment=0]xypcre.zip[/attachment]
btw, I have noticed simple patterns like '.*', '' etc can hang the processor.
Hit ESC to quit from XY, and right click xypcre icon in taskbar and exit. (optionally clear latest permavar with this type of name: $P_UDF_pcre_IFS*)
Re: [discussion]Better regex engine for XYplorer
Posted: 12 Aug 2015 01:00
by Papoulka
I have noticed simple patterns like '.*', '' etc can hang the processor
FYI, ".*" can cause a lot of unexpected though usually harmless internal engine backtracking. Ref eg.
https://blog.mariusschulz.com/2014/06/0 ... ually-want So the processor may not be hung but is churning through some loop many more times than anticipated.
Re: [discussion]Better regex engine for XYplorer
Posted: 12 Aug 2015 08:39
by bdeshi
I know, this is what I meant (or thought I meant) in layman's terms. Thanks for the clarification.