XYplorer Beta Club

Posted: **03 Aug 2015 17:47**

I could possibly help as well (but I guess Marco's will be sufficient

)

Posted: **03 Aug 2015 21:44**

thanks!
I'll try to push my progress online soon.
Away from useable imternet for a few days, this might go sleepy...

I had been struggling with the problem of synchronizing message sending and receiving, so that both party waits until the other is ready to continue.
I've since had a shiny idea, concerning permavar-dependant infinite while loops! Won't impress anybody with it's beauty, but works for the time being (pcrematch is basically done), Not slow either, not noticeably at least.

@Marco, you're right about the control flow, ~~except not every | has to be escaped, but only when they match the separator string exactly. [slightly faster]~~
There's another reason the separator has to be at least two characters: so that gettoken can retrieve complete tokens from the return. Else even an escaped \| will be considered as a separator.

pcrematch
matches the pattern and returns a $sep separated matchlist, where each match is escaped for the separator. Returnmatch is a match-index; if it's defined, only a single match is returned, unescaped

pcrecapture is supposed to return captured groups

Code: Select all

text pcrecapture("abc[xyz]<crlf>[1||2||3]", '(?mi)^(\w)*(\[.*?\])', 1, '||'); // abc
text pcrecapture("abc[xyz]<crlf>[1||2]", '(?mi)^(\w)*(\[.*?\])', 2, '||'); // [xyz]||[1]\|\|2

pcresplit splits a string at pattern matches. Got the idea from php.
text pcresplit('abcd,efgh.ijkl', '[,\.]'); //abcd||efgh||ijkl

at this point I'd like to say, I felt extremely happy, typing pcre syntax in XY! :biggreen:

Posted: **03 Aug 2015 23:52**

I had no problems of timing with my proof of concept, but maybe because I tested very simple patterns.
Anyway, the road to go should be asking Don to implement a mode 3 to copydata, ie. send and wait till a reply is received. Infinite while-loops might be CPU intensive for nothing.

Re the control flow. Better

Code: Select all

Input: string, regex, separator

1. Obtain an array of matches
2. "Replace/escape" all the 'separator' with '\separator' in the matches while they're still contained in the array
3. Flatten the array using 'separatorseparator' as separator between elements

Posted: **04 Aug 2015 09:32**

another problem is to gracefully stop the udf loop if xypcre hangs or crashes.

Posted: **04 Aug 2015 11:00**

This is the pre-finalized format

Code: Select all

pcrematch($string, $pattern, $sep='||', $returnmatch, $unescaped=0)

string       string to search in (haystack)

pattern      The RegExp pattern to search for in string (needle).
             All PCRE syntax is supported. Options can be defined
             using PCRE syntax, eg, (?mi).*pattern.*

separator    Separator between matches in the returned list. This
             must be at least two characters long. If only one
             character is given, it's silently doubled.
             Defaults to "||".

returnmatch  1-based index of only one match to return. If this is
             greater than total matchcount, the last one is returned.
             Ineffective if less than 1.

unescaped    Turns off separator escaping in returned match(es)
             This is useful when it's known that the source string
             does not contain the separator string. (eg, it's a single
             line string, and separator is given as <crlf>)

By default, each character in the separator is escaped in returned matches
as \s, \e, \p...
So that a gettoken() on the return can retrieve matches in whole.
As a result, the retrieved token has to be run through a replacement command:
replacelist('retrieved match', '\s,\e,\p', 's,e,p',',')

All good?

the unescaped parameter and escaping rules will also be used in other functions that return tokenized strings.

Posted: **04 Aug 2015 19:28**

here's the latest draft.
not compiled. the xyi and au3 is expected to be in "<xyscripts>\inc\xypcre\"
only basic matching is implemented. escaping is not.

WARNING: looking at code may induce nausea and/or a feeling of hostility towards author.
WARNING 2: work in progress. not ready for use.
~~[attachment=0]xypcre.7z.xys[/attachment]~~

Posted: **08 Aug 2015 10:26**

Added replace and group capture functions. Made a change to pcrematch so it returns global pattern matches.
Also changed the escaping scheme to square bracket enclosing [|]. Because if a token ended with | then after escaping it'd become "token\|||" and a gettoken would return only up to the \
(I know, it's unlike any regular scripting convention. How about a specialized gettoken mirror called pcretoken()?
~~[attachment=0]xypcre.7z.xys[/attachment]~~

Posted: **08 Aug 2015 15:55**

Not to teach you guys to suck eggs, but this is on-topic and could help other newbies like me...

I lamented the lack of lookbehind because I often want to find strings that don't match a regex pattern. I finally realized that XY can do great things in that regard using "Invert" and especially Boolean Regex. IMVHO these make XY much more powerful for pure matching than any single regex engine could be. Even if not, these features are far easier to use than creating standard regexs for the same tasks.

Of course this doesn't address arrays, or replacements, and perhaps many other modern features. But it greatly increases the utility of the engine we have.

Posted: **08 Aug 2015 16:55**

that's an advantage from the scripting side, rather than the regex. As a result, this can be achieved using any other regex engine.

Posted: **08 Aug 2015 19:41**

I was referring to plain "File Find", and what it can do without resorting to scripting.

In fact I would like to know how to use eg. boolean regex to find files via a script. I know there is a way but have been thinking it's too difficult for me. Meaning that the time it would take me to learn / relearn it would be >> more than it would save. Not to further hijack this thread - I already have one: http://www.xyplorer.com/xyfc/viewtopic.php?f=5&t=14404 and welcome any tips there.

Posted: **09 Aug 2015 21:01**

back to xypcre:
I've added a pcretoken() (~gettoken) function to sidestep much of that escape/unescape conundrum.

Now each function that can return multiple tokens have a $format param (in place of $unesc)
format
0: just return matches separated by separator (equiv to $unesc=1)
1: matches are escaped against separator chars (equiv to $unesc=0)
2: return in this format:
token1length,token2length|token1token2
(this is not a bitfield -- 3 is not a valid choice.)
$separator is irrelevant when format=2

pcretoken($tokenlist, token, separator, format)
returns one match. Takes care of unescaping tokens.

So how does it sound?

This is an idea to make it all cleaner and more efficient.
the usual gettoken () way is still possible of course.

If all goes well, this project might be ready for testing and even more feedback by tomorrow! yay..

Posted: **10 Aug 2015 11:36**

here's the "semi-final", USABLE edition!
contains xyi and compiled exe.

Please test and help fix bugs or add/change features!
This topic and the xyi contains much explanation and usage notes.

functions that return multiple matches can return data on a particular format. Use pcretoken() to get one match from this return.

~~[attachment=0]xypcre.7z.xys[/attachment]~~

Posted: **11 Aug 2015 16:42**

almost forgot the splitting function!

Code: Select all

/*pcresplit()
   Splits a string into substrings at each position
   where a regexp pattern matches.
   Returns substrings in defined format.
$string   String to work on.
$pattern  The RegExp pattern to match.
          The portion that matches is removed.
          Part or the pattern can be skipped:
          (?<=pre)pattern(?=post)
$sep      same as in pcrematch()
$format   same as in pcrematch()
Notes: see notes of pcrematch()
*/

~~[attachment=0]xypcre.zip[/attachment]~~

btw, I have noticed simple patterns like '.*', '' etc can hang the processor.
Hit ESC to quit from XY, and right click xypcre icon in taskbar and exit. (optionally clear latest permavar with this type of name: $P_UDF_pcre_IFS*)

Posted: **12 Aug 2015 01:00**

I have noticed simple patterns like '.*', '' etc can hang the processor

FYI, ".*" can cause a lot of unexpected though usually harmless internal engine backtracking. Ref eg. https://blog.mariusschulz.com/2014/06/0 ... ually-want So the processor may not be hung but is churning through some loop many more times than anticipated.

Posted: **12 Aug 2015 08:39**

I know, this is what I meant (or thought I meant) in layman's terms. Thanks for the clarification.

XYplorer Beta Club

[discussion]Better regex engine for XYplorer

Re: [discussion]Better regex engine for XYplorer

Re: [discussion]Better regex engine for XYplorer

Re: [discussion]Better regex engine for XYplorer

Re: [discussion]Better regex engine for XYplorer

Re: [discussion]Better regex engine for XYplorer

Re: [discussion]Better regex engine for XYplorer

Re: [discussion]Better regex engine for XYplorer

Re: [discussion]Better regex engine for XYplorer

Re: [discussion]Better regex engine for XYplorer

Re: [discussion]Better regex engine for XYplorer

Re: [discussion]Better regex engine for XYplorer

Re: [discussion]Better regex engine for XYplorer

Re: [discussion]Better regex engine for XYplorer

Re: [discussion]Better regex engine for XYplorer

Re: [discussion]Better regex engine for XYplorer