RegEx: fetching the rightmost pattern?

Discuss and share scripts and script files...
Post Reply
Filehero
Posts: 2644
Joined: 27 Feb 2012 18:50
Location: Windows 10 Pro x64

RegEx: fetching the rightmost pattern?

Post by Filehero »

Hi,

my nightmare subject. :mrgreen:

Given the strings

Code: Select all

1) "c:\PK5_7404 (2014_09_16 19_38_20 UTC) ((Blub Bla) Bla Blub (2014_09_16 19_38_20 UTC)"
2) "c:\PK5_7404 (2014_09_16 19_38_20 ) ((Blub Bla) Bla Blub (2014_09_16 19_38_20 UTC)"
3) "c:\PK5_7404 (2014_09_16 1) ((Blub Bla) Bla Blub (2014_09_16 19_38_20 UTC)"
What is the proper RegEx to always perfectly get the rightmost timestamp "2014_09_16 19_38_20"?

Is there something better than

Code: Select all

"(\d{4}_\d{2}_\d{2})[^(UTC)]*(?=\sUTC\))"
?
Of course, in cases like 1) it will give me more than one match. But I can't get the proper non-greedy/negating expression to always get the rightest on first hit. :?


Thanks,
Filehero

bdeshi
Posts: 4249
Joined: 12 Mar 2014 17:27
Location: Asteroid B-612 / Dhaka
Contact:

Re: RegEx: fetching the rightmost pattern?

Post by bdeshi »

Code: Select all

"\([^(]+UTC\)$"
Icon Names | Onyx | Undocumented Commands | xypcre
[ this user is asleep ]

binocular222
Posts: 1416
Joined: 04 Nov 2008 05:35
Location: Hanoi, Vietnam

Re: RegEx: fetching the rightmost pattern?

Post by binocular222 »

Code: Select all

\([0-9_ ]+UTC\)$
I'm a casual coder using AHK language. All of my xys scripts:
http://www.xyplorer.com/xyfc/viewtopic. ... 243#p82488

bdeshi
Posts: 4249
Joined: 12 Mar 2014 17:27
Location: Asteroid B-612 / Dhaka
Contact:

Re: RegEx: fetching the rightmost pattern?

Post by bdeshi »

Code: Select all

"^.+(\([^(]+UTC\))"
[$1 == match]

Code: Select all

 $str = "c:\PK5_7404 (2014_09_16 19_38_34 UTC) ((Blub Bla) Bla Blub (2014_09_16 19_38_20 UTC)";
 $ret =  echo regexreplace($str,"^.+(\([^(]+UTC\))","$1");
Icon Names | Onyx | Undocumented Commands | xypcre
[ this user is asleep ]

Stefan
Posts: 1360
Joined: 18 Nov 2008 21:47
Location: Europe

Re: RegEx: fetching the rightmost pattern?

Post by Stefan »

Filehero wrote:Given the strings

Code: Select all

1) "c:\PK5_7404 (2014_09_16 19_38_20 UTC) ((Blub Bla) Bla Blub (2014_09_16 19_38_20 UTC)"
2) "c:\PK5_7404 (2014_09_16 19_38_20 ) ((Blub Bla) Bla Blub (2014_09_16 19_38_20 UTC)"
3) "c:\PK5_7404 (2014_09_16 1) ((Blub Bla) Bla Blub (2014_09_16 19_38_20 UTC)"
What is the proper RegEx to always perfectly get the rightmost timestamp "2014_09_16 19_38_20"?
The here used regex engine works greedy.
So ".+\(" will search for the rightmost opening parenthesis "(".
Next we capture everything by utilizing "(.+)" until we will match " UTC.*"

So I would use the expression: ".+\((.+) UTC.*" and utilizing "$1" to get back what was matched by our "(.+)" search pattern.


POC:

Code: Select all

$input = <<<$$$
 1) "c:\PK5_7404 (2014_09_16 19_38_20 UTC) ((Blub Bla) Bla Blub (2015_09_16 19_38_20 UTC)"
 2) "c:\PK5_7404 (2016_09_16 19_38_20 ) ((Blub Bla) Bla Blub (2017_09_16 19_38_20 UTC)"
 3) "c:\PK5_7404 (2018_09_16 1) ((Blub Bla) Bla Blub (2019_09_16 19_38_20 UTC)"
$$$;

 ForEach( $line, $input, "<crlf>"){
   //regexreplace(string, pattern, replacement, [matchcase])
   $result = regexreplace($line, ".+\((.+) UTC.*", "$1");
   msg "$line<crlf 3>$result";
 }

Results:

Code: Select all

---------------------------
XYplorer
---------------------------
 1) "c:\PK5_7404 (2014_09_16 19_38_20 UTC) ((Blub Bla) Bla Blub (2015_09_16 19_38_20 UTC)"


2015_09_16 19_38_20
---------------------------
OK   
---------------------------

Code: Select all

---------------------------
XYplorer
---------------------------
 2) "c:\PK5_7404 (2016_09_16 19_38_20 ) ((Blub Bla) Bla Blub (2017_09_16 19_38_20 UTC)"


2017_09_16 19_38_20
---------------------------
OK   
---------------------------

Code: Select all

---------------------------
XYplorer
---------------------------
 3) "c:\PK5_7404 (2018_09_16 1) ((Blub Bla) Bla Blub (2019_09_16 19_38_20 UTC)"


2019_09_16 19_38_20
---------------------------
OK   
---------------------------

HTH? :D

Filehero
Posts: 2644
Joined: 27 Feb 2012 18:50
Location: Windows 10 Pro x64

Re: RegEx: fetching the rightmost pattern?

Post by Filehero »

Hi binocular222, Sammay & Stefan
Stefan wrote:HTH? :D
Yes, all of you. Thanks a lot! :D


So with regexmatches my one was the shortest? Or is it a rule of thumb that if I want the "boundaries" not being part of the result string I use regexreplace from the very beginning?
Stefan wrote:So I would use the expression: ".+\((.+) UTC.*" and utilizing "$1" to get back what was matched by our "(.+)" search pattern.
In this case, "$1" is not a real "replacement" but the back reference to the match which than will be returned instead of a replaced match?


Me and RegEx - each time a new adventure, but a funny one. :ninja:

Cheers,
Filehero

Stefan
Posts: 1360
Joined: 18 Nov 2008 21:47
Location: Europe

Re: RegEx: fetching the rightmost pattern?

Post by Stefan »

Filehero wrote:In this case, "$1" is not a real "replacement"
but the back reference to the match which than will be returned instead of a replaced match?
Right.
IOW: I would say that "$1" is always the back reference, which often is used as replacement, or like here as cut-out procedure.



Explanations:

regexmatches()
Returns a LIST of all matches of a regular expression pattern in a given string.

regexreplace()
Replaces parts of a string, using a regular expression pattern.


So with regexmatches() you just could match (for example) all timestamps to a new list (like a array)
and than pick one from that new list, or process one after the other in a foreach loop.

Code: Select all

   //regexmatches(string, pattern, [separator="|"], [matchcase=0])
   $result = regexmatches($input, "[\d_ ]{19}", "<crlf>"); 
With regexreplace() you can replace parts of the origin string to a new string, or just cut out the part you want (as I did)

Code: Select all

   //regexreplace(string, pattern, replacement, [matchcase])
   $result = regexreplace($line, ".+\((.+) UTC.*", "$1");



You may see now the differences more clearly that there may be different cases where you have to choose the right tool to do the job.


###################################




POC:


regexmatches()
Returns a LIST of all matches of a regular expression pattern in a given string. (would be one single match only, in our case above)

Code: Select all

$input = <<<$$$
 1) "c:\PK5_7404 (2014_09_16 19_38_20 UTC) ((Blub Bla) Bla Blub (2015_09_16 19_38_20 UTC)"
 2) "c:\PK5_7404 (2016_09_16 19_38_20 ) ((Blub Bla) Bla Blub (2017_09_16 19_38_20 UTC)"
 3) "c:\PK5_7404 (2018_09_16 1) ((Blub Bla) Bla Blub (2019_09_16 19_38_20 UTC)"
$$$;

   //regexmatches(string, pattern, [separator="|"], [matchcase=0])
   $result = regexmatches($input, "[\d_ ]{19}", "<crlf>");
   msg "$input<crlf 3>$result";
 
Result:

Code: Select all

---------------------------
XYplorer
---------------------------
 1) "c:\PK5_7404 (2014_09_16 19_38_20 UTC) ((Blub Bla) Bla Blub (2015_09_16 19_38_20 UTC)"
 2) "c:\PK5_7404 (2016_09_16 19_38_20 ) ((Blub Bla) Bla Blub (2017_09_16 19_38_20 UTC)"
 3) "c:\PK5_7404 (2018_09_16 1) ((Blub Bla) Bla Blub (2019_09_16 19_38_20 UTC)"


2014_09_16 19_38_20
2015_09_16 19_38_20
2016_09_16 19_38_20
2017_09_16 19_38_20
2019_09_16 19_38_20
---------------------------
OK   
---------------------------

OR JUST THE LAST ONE FROM THE RESULT LIST:

Code: Select all

$input = <<<$$$
 1) "c:\PK5_7404 (2014_09_16 19_38_20 UTC) ((Blub Bla) Bla Blub (2015_09_16 19_38_20 UTC)"
 2) "c:\PK5_7404 (2016_09_16 19_38_20 ) ((Blub Bla) Bla Blub (2017_09_16 19_38_20 UTC)"
 3) "c:\PK5_7404 (2018_09_16 1) ((Blub Bla) Bla Blub (2019_09_16 19_38_20 UTC)"
$$$;


 ForEach( $line, $input, "<crlf>"){

   // capture the parts from the origin string into an result string $tmp:
   //regexmatches(string, pattern, [separator="|"], [matchcase=0])
   $tmp = regexmatches($line, "[\d_ ]+", ",");

   // get the last (" -1 ") item from the $tmp string:
   //gettoken(string, [index=1], [separator=" "], [format], [flags])
   $result = gettoken($tmp, -1, ",");

   // example output:
   msg "$line<crlf 3>$result";
 }
Results:

Code: Select all

---------------------------
XYplorer
---------------------------
 1) "c:\PK5_7404 (2014_09_16 19_38_20 UTC) ((Blub Bla) Bla Blub (2015_09_16 19_38_20 UTC)"

2015_09_16 19_38_20 
---------------------------
OK   
---------------------------
and so on...
2017_09_16 19_38_20
2019_09_16 19_38_20


------------- or:


FOR EACH ITEM FROM THE RESULT LIST:

Code: Select all

$input = <<<$$$
 1) "c:\PK5_7404 (2014_09_16 19_38_20 UTC) ((Blub Bla) Bla Blub (2015_09_16 19_38_20 UTC)"
 2) "c:\PK5_7404 (2016_09_16 19_38_20 ) ((Blub Bla) Bla Blub (2017_09_16 19_38_20 UTC)"
 3) "c:\PK5_7404 (2018_09_16 1) ((Blub Bla) Bla Blub (2019_09_16 19_38_20 UTC)"
$$$;


 ForEach( $line, $input, "<crlf>"){

      //regexmatches(string, pattern, [separator="|"], [matchcase=0])
      $tmp = regexmatches($line, "[\d_ ]{19}", ",");
      
      ForEach( $index, "1,2", ","){
          //gettoken(string, [index=1], [separator=" "], [format], [flags])
          $result = gettoken($tmp, $index , ",");
          if($result){msg "$line<crlf 3>$result";}
      }
 }
 
Results:

Code: Select all

---------------------------
XYplorer
---------------------------
 1) "c:\PK5_7404 (2014_09_16 19_38_20 UTC) ((Blub Bla) Bla Blub (2015_09_16 19_38_20 UTC)"

2014_09_16 19_38_20
---------------------------
OK   
---------------------------

Code: Select all

---------------------------
XYplorer
---------------------------
 1) "c:\PK5_7404 (2014_09_16 19_38_20 UTC) ((Blub Bla) Bla Blub (2015_09_16 19_38_20 UTC)"

2015_09_16 19_38_20
---------------------------
OK   
---------------------------
and so on...

2) "c:\PK5_7404 (2016_09_16 19_38_20 ) ((Blub Bla) Bla Blub (2017_09_16 19_38_20 UTC)"
2016_09_16 19_38_20

2) "c:\PK5_7404 (2016_09_16 19_38_20 ) ((Blub Bla) Bla Blub (2017_09_16 19_38_20 UTC)"
2017_09_16 19_38_20

3) "c:\PK5_7404 (2018_09_16 1) ((Blub Bla) Bla Blub (2019_09_16 19_38_20 UTC)"
2019_09_16 19_38_20



###################################

regexreplace()
Replaces parts of a string, using a regular expression pattern.

Code: Select all

$input = <<<$$$
 1) "c:\PK5_7404 (2014_09_16 19_38_20 UTC) ((Blub Bla) Bla Blub (2015_09_16 19_38_20 UTC)"
 2) "c:\PK5_7404 (2016_09_16 19_38_20 ) ((Blub Bla) Bla Blub (2017_09_16 19_38_20 UTC)"
 3) "c:\PK5_7404 (2018_09_16 1) ((Blub Bla) Bla Blub (2019_09_16 19_38_20 UTC)"
$$$;

 ForEach( $line, $input, "<crlf>"){
   //regexreplace(string, pattern, replacement, [matchcase])
   $result = regexreplace($line, ".+\((.+) UTC.*", "$1");
   msg "$line<crlf 3>$result";
 }

Results:

Code: Select all

---------------------------
XYplorer
---------------------------
 1) "c:\PK5_7404 (2014_09_16 19_38_20 UTC) ((Blub Bla) Bla Blub (2015_09_16 19_38_20 UTC)"


2015_09_16 19_38_20
---------------------------
OK   
---------------------------
and so on...
2017_09_16 19_38_20
2019_09_16 19_38_20



 
HTH? :D

Filehero
Posts: 2644
Joined: 27 Feb 2012 18:50
Location: Windows 10 Pro x64

Re: RegEx: fetching the rightmost pattern?

Post by Filehero »

Stefan wrote:HTH? :D
Stefan, as always: yes! :tup: :tup: :tup:

Thanks for the further explanations. I'm sure they will serve some more people over here.

FH

Post Reply