Page 1 of 1

RegEx: fetching the rightmost pattern?

Posted: 30 Dec 2014 13:30
by Filehero
Hi,

my nightmare subject. :mrgreen:

Given the strings

Code: Select all

1) "c:\PK5_7404 (2014_09_16 19_38_20 UTC) ((Blub Bla) Bla Blub (2014_09_16 19_38_20 UTC)"
2) "c:\PK5_7404 (2014_09_16 19_38_20 ) ((Blub Bla) Bla Blub (2014_09_16 19_38_20 UTC)"
3) "c:\PK5_7404 (2014_09_16 1) ((Blub Bla) Bla Blub (2014_09_16 19_38_20 UTC)"
What is the proper RegEx to always perfectly get the rightmost timestamp "2014_09_16 19_38_20"?

Is there something better than

Code: Select all

"(\d{4}_\d{2}_\d{2})[^(UTC)]*(?=\sUTC\))"
?
Of course, in cases like 1) it will give me more than one match. But I can't get the proper non-greedy/negating expression to always get the rightest on first hit. :?


Thanks,
Filehero

Re: RegEx: fetching the rightmost pattern?

Posted: 30 Dec 2014 13:38
by bdeshi

Code: Select all

"\([^(]+UTC\)$"

Re: RegEx: fetching the rightmost pattern?

Posted: 30 Dec 2014 14:55
by binocular222

Code: Select all

\([0-9_ ]+UTC\)$

Re: RegEx: fetching the rightmost pattern?

Posted: 30 Dec 2014 16:18
by bdeshi

Code: Select all

"^.+(\([^(]+UTC\))"
[$1 == match]

Code: Select all

 $str = "c:\PK5_7404 (2014_09_16 19_38_34 UTC) ((Blub Bla) Bla Blub (2014_09_16 19_38_20 UTC)";
 $ret =  echo regexreplace($str,"^.+(\([^(]+UTC\))","$1");

Re: RegEx: fetching the rightmost pattern?

Posted: 30 Dec 2014 16:25
by Stefan
Filehero wrote:Given the strings

Code: Select all

1) "c:\PK5_7404 (2014_09_16 19_38_20 UTC) ((Blub Bla) Bla Blub (2014_09_16 19_38_20 UTC)"
2) "c:\PK5_7404 (2014_09_16 19_38_20 ) ((Blub Bla) Bla Blub (2014_09_16 19_38_20 UTC)"
3) "c:\PK5_7404 (2014_09_16 1) ((Blub Bla) Bla Blub (2014_09_16 19_38_20 UTC)"
What is the proper RegEx to always perfectly get the rightmost timestamp "2014_09_16 19_38_20"?
The here used regex engine works greedy.
So ".+\(" will search for the rightmost opening parenthesis "(".
Next we capture everything by utilizing "(.+)" until we will match " UTC.*"

So I would use the expression: ".+\((.+) UTC.*" and utilizing "$1" to get back what was matched by our "(.+)" search pattern.


POC:

Code: Select all

$input = <<<$$$
 1) "c:\PK5_7404 (2014_09_16 19_38_20 UTC) ((Blub Bla) Bla Blub (2015_09_16 19_38_20 UTC)"
 2) "c:\PK5_7404 (2016_09_16 19_38_20 ) ((Blub Bla) Bla Blub (2017_09_16 19_38_20 UTC)"
 3) "c:\PK5_7404 (2018_09_16 1) ((Blub Bla) Bla Blub (2019_09_16 19_38_20 UTC)"
$$$;

 ForEach( $line, $input, "<crlf>"){
   //regexreplace(string, pattern, replacement, [matchcase])
   $result = regexreplace($line, ".+\((.+) UTC.*", "$1");
   msg "$line<crlf 3>$result";
 }

Results:

Code: Select all

---------------------------
XYplorer
---------------------------
 1) "c:\PK5_7404 (2014_09_16 19_38_20 UTC) ((Blub Bla) Bla Blub (2015_09_16 19_38_20 UTC)"


2015_09_16 19_38_20
---------------------------
OK   
---------------------------

Code: Select all

---------------------------
XYplorer
---------------------------
 2) "c:\PK5_7404 (2016_09_16 19_38_20 ) ((Blub Bla) Bla Blub (2017_09_16 19_38_20 UTC)"


2017_09_16 19_38_20
---------------------------
OK   
---------------------------

Code: Select all

---------------------------
XYplorer
---------------------------
 3) "c:\PK5_7404 (2018_09_16 1) ((Blub Bla) Bla Blub (2019_09_16 19_38_20 UTC)"


2019_09_16 19_38_20
---------------------------
OK   
---------------------------

HTH? :D

Re: RegEx: fetching the rightmost pattern?

Posted: 30 Dec 2014 17:28
by Filehero
Hi binocular222, Sammay & Stefan
Stefan wrote:HTH? :D
Yes, all of you. Thanks a lot! :D


So with regexmatches my one was the shortest? Or is it a rule of thumb that if I want the "boundaries" not being part of the result string I use regexreplace from the very beginning?
Stefan wrote:So I would use the expression: ".+\((.+) UTC.*" and utilizing "$1" to get back what was matched by our "(.+)" search pattern.
In this case, "$1" is not a real "replacement" but the back reference to the match which than will be returned instead of a replaced match?


Me and RegEx - each time a new adventure, but a funny one. :ninja:

Cheers,
Filehero

Re: RegEx: fetching the rightmost pattern?

Posted: 30 Dec 2014 18:14
by Stefan
Filehero wrote:In this case, "$1" is not a real "replacement"
but the back reference to the match which than will be returned instead of a replaced match?
Right.
IOW: I would say that "$1" is always the back reference, which often is used as replacement, or like here as cut-out procedure.



Explanations:

regexmatches()
Returns a LIST of all matches of a regular expression pattern in a given string.

regexreplace()
Replaces parts of a string, using a regular expression pattern.


So with regexmatches() you just could match (for example) all timestamps to a new list (like a array)
and than pick one from that new list, or process one after the other in a foreach loop.

Code: Select all

   //regexmatches(string, pattern, [separator="|"], [matchcase=0])
   $result = regexmatches($input, "[\d_ ]{19}", "<crlf>"); 
With regexreplace() you can replace parts of the origin string to a new string, or just cut out the part you want (as I did)

Code: Select all

   //regexreplace(string, pattern, replacement, [matchcase])
   $result = regexreplace($line, ".+\((.+) UTC.*", "$1");



You may see now the differences more clearly that there may be different cases where you have to choose the right tool to do the job.


###################################




POC:


regexmatches()
Returns a LIST of all matches of a regular expression pattern in a given string. (would be one single match only, in our case above)

Code: Select all

$input = <<<$$$
 1) "c:\PK5_7404 (2014_09_16 19_38_20 UTC) ((Blub Bla) Bla Blub (2015_09_16 19_38_20 UTC)"
 2) "c:\PK5_7404 (2016_09_16 19_38_20 ) ((Blub Bla) Bla Blub (2017_09_16 19_38_20 UTC)"
 3) "c:\PK5_7404 (2018_09_16 1) ((Blub Bla) Bla Blub (2019_09_16 19_38_20 UTC)"
$$$;

   //regexmatches(string, pattern, [separator="|"], [matchcase=0])
   $result = regexmatches($input, "[\d_ ]{19}", "<crlf>");
   msg "$input<crlf 3>$result";
 
Result:

Code: Select all

---------------------------
XYplorer
---------------------------
 1) "c:\PK5_7404 (2014_09_16 19_38_20 UTC) ((Blub Bla) Bla Blub (2015_09_16 19_38_20 UTC)"
 2) "c:\PK5_7404 (2016_09_16 19_38_20 ) ((Blub Bla) Bla Blub (2017_09_16 19_38_20 UTC)"
 3) "c:\PK5_7404 (2018_09_16 1) ((Blub Bla) Bla Blub (2019_09_16 19_38_20 UTC)"


2014_09_16 19_38_20
2015_09_16 19_38_20
2016_09_16 19_38_20
2017_09_16 19_38_20
2019_09_16 19_38_20
---------------------------
OK   
---------------------------

OR JUST THE LAST ONE FROM THE RESULT LIST:

Code: Select all

$input = <<<$$$
 1) "c:\PK5_7404 (2014_09_16 19_38_20 UTC) ((Blub Bla) Bla Blub (2015_09_16 19_38_20 UTC)"
 2) "c:\PK5_7404 (2016_09_16 19_38_20 ) ((Blub Bla) Bla Blub (2017_09_16 19_38_20 UTC)"
 3) "c:\PK5_7404 (2018_09_16 1) ((Blub Bla) Bla Blub (2019_09_16 19_38_20 UTC)"
$$$;


 ForEach( $line, $input, "<crlf>"){

   // capture the parts from the origin string into an result string $tmp:
   //regexmatches(string, pattern, [separator="|"], [matchcase=0])
   $tmp = regexmatches($line, "[\d_ ]+", ",");

   // get the last (" -1 ") item from the $tmp string:
   //gettoken(string, [index=1], [separator=" "], [format], [flags])
   $result = gettoken($tmp, -1, ",");

   // example output:
   msg "$line<crlf 3>$result";
 }
Results:

Code: Select all

---------------------------
XYplorer
---------------------------
 1) "c:\PK5_7404 (2014_09_16 19_38_20 UTC) ((Blub Bla) Bla Blub (2015_09_16 19_38_20 UTC)"

2015_09_16 19_38_20 
---------------------------
OK   
---------------------------
and so on...
2017_09_16 19_38_20
2019_09_16 19_38_20


------------- or:


FOR EACH ITEM FROM THE RESULT LIST:

Code: Select all

$input = <<<$$$
 1) "c:\PK5_7404 (2014_09_16 19_38_20 UTC) ((Blub Bla) Bla Blub (2015_09_16 19_38_20 UTC)"
 2) "c:\PK5_7404 (2016_09_16 19_38_20 ) ((Blub Bla) Bla Blub (2017_09_16 19_38_20 UTC)"
 3) "c:\PK5_7404 (2018_09_16 1) ((Blub Bla) Bla Blub (2019_09_16 19_38_20 UTC)"
$$$;


 ForEach( $line, $input, "<crlf>"){

      //regexmatches(string, pattern, [separator="|"], [matchcase=0])
      $tmp = regexmatches($line, "[\d_ ]{19}", ",");
      
      ForEach( $index, "1,2", ","){
          //gettoken(string, [index=1], [separator=" "], [format], [flags])
          $result = gettoken($tmp, $index , ",");
          if($result){msg "$line<crlf 3>$result";}
      }
 }
 
Results:

Code: Select all

---------------------------
XYplorer
---------------------------
 1) "c:\PK5_7404 (2014_09_16 19_38_20 UTC) ((Blub Bla) Bla Blub (2015_09_16 19_38_20 UTC)"

2014_09_16 19_38_20
---------------------------
OK   
---------------------------

Code: Select all

---------------------------
XYplorer
---------------------------
 1) "c:\PK5_7404 (2014_09_16 19_38_20 UTC) ((Blub Bla) Bla Blub (2015_09_16 19_38_20 UTC)"

2015_09_16 19_38_20
---------------------------
OK   
---------------------------
and so on...

2) "c:\PK5_7404 (2016_09_16 19_38_20 ) ((Blub Bla) Bla Blub (2017_09_16 19_38_20 UTC)"
2016_09_16 19_38_20

2) "c:\PK5_7404 (2016_09_16 19_38_20 ) ((Blub Bla) Bla Blub (2017_09_16 19_38_20 UTC)"
2017_09_16 19_38_20

3) "c:\PK5_7404 (2018_09_16 1) ((Blub Bla) Bla Blub (2019_09_16 19_38_20 UTC)"
2019_09_16 19_38_20



###################################

regexreplace()
Replaces parts of a string, using a regular expression pattern.

Code: Select all

$input = <<<$$$
 1) "c:\PK5_7404 (2014_09_16 19_38_20 UTC) ((Blub Bla) Bla Blub (2015_09_16 19_38_20 UTC)"
 2) "c:\PK5_7404 (2016_09_16 19_38_20 ) ((Blub Bla) Bla Blub (2017_09_16 19_38_20 UTC)"
 3) "c:\PK5_7404 (2018_09_16 1) ((Blub Bla) Bla Blub (2019_09_16 19_38_20 UTC)"
$$$;

 ForEach( $line, $input, "<crlf>"){
   //regexreplace(string, pattern, replacement, [matchcase])
   $result = regexreplace($line, ".+\((.+) UTC.*", "$1");
   msg "$line<crlf 3>$result";
 }

Results:

Code: Select all

---------------------------
XYplorer
---------------------------
 1) "c:\PK5_7404 (2014_09_16 19_38_20 UTC) ((Blub Bla) Bla Blub (2015_09_16 19_38_20 UTC)"


2015_09_16 19_38_20
---------------------------
OK   
---------------------------
and so on...
2017_09_16 19_38_20
2019_09_16 19_38_20



 
HTH? :D

Re: RegEx: fetching the rightmost pattern?

Posted: 30 Dec 2014 18:19
by Filehero
Stefan wrote:HTH? :D
Stefan, as always: yes! :tup: :tup: :tup:

Thanks for the further explanations. I'm sure they will serve some more people over here.

FH