A bit of regex help needed

Please check the FAQ (https://www.xyplorer.com/faq.php) before posting a question...
Post Reply
highend
Posts: 13274
Joined: 06 Feb 2011 00:33

A bit of regex help needed

Post by highend »

Hi,

I want to use a function in my script, that does a fair amount of converting windows paths (with / without filenames) into unix paths.

The function gets an input (global variable) which can contain:
1.) one single windows path without a file name; e.g. <curpath>
2.) one single windows path with a file name; e.g. <curitem>
3.) a | separated list of windows paths with file names; e.g. get("SelectedItemsPathNames", "|");

It won't get a list with a | separated mix of paths with and without file names (when a user would select a file and a dir in the same directory).

Input examples:

Code: Select all

1.) D:\Users\Highend\Tools\Steuer Erklärung
2.) D:\Users\Highend\Tools\Steuer Erklärung\Jahr 2010.pdf
3.) D:\Users\Highend\Tools\Steuer Erklärung\Jahr 2010.pdf|D:\Users\Highend\Tools\Steuer Erklärung\Jahr 2011.pdf
Regardless of which of the 3 alternatives was the input I'd like to have one output:

Code: Select all

1.) /Users/Highend/Tools/Steuer Erklärung
I use two steps to convert the input in to unix style:

$CTU_SelectedItems contains input example 1, 2 or 3.

Code: Select all

$WindowsPath = $CTU_SelectedItems;
$UnixPath = regexreplace($WindowsPath, "([A-Za-z]):", "");
$UnixPath = replace($UnixPath, "\", "/");
After that $UnixPathWithFileName contains:

Code: Select all

1.) /Users/Highend/Tools/Steuer Erklärung
or
2.) /Users/Highend/Tools/Steuer Erklärung/Jahr 2010.pdf
or
3.) /Users/Highend/Tools/Steuer Erklärung/Jahr 2010.pdf|/Users/Highend/Tools/Steuer Erklärung/Jahr 2011.pdf
Now I need a last regex to get only the path name (for any of the 3. cases).All path names will be the same, only the file names are different.

So:

Code: Select all

$UnixPath = regexreplace($UnixPath, "???", "$1");
Tia,
highend
One of my scripts helped you out? Please donate via Paypal

Stefan
Posts: 1360
Joined: 18 Nov 2008 21:47
Location: Europe

Re: A bit of regex help needed

Post by Stefan »

Hi, i have not the time right now for more, but


3.) /Users/Highend/Tools/Steuer Erklärung/Jahr 2010.pdf|/Users/Highend/Tools/Steuer Erklärung/Jahr 2011.pdf

- check if there is an "|", if yes, use gettoken to get the first part only


3.) /Users/Highend/Tools/Steuer Erklärung/Jahr 2010.pdf
- to get the path only i would not use an regex "(.+)/.+", "$1"

but maybe an "_InStrRev Function" >> http://www.xyplorer.com/xyfc/viewtopic. ... 515#p57515



Perhaps later more....

highend
Posts: 13274
Joined: 06 Feb 2011 00:33

Re: A bit of regex help needed

Post by highend »

- check if there is an "|", if yes, use gettoken to get the first part only
Yes, I think that's the easiest way.

$UnixPathWithFileName contains the converted unix path(s)
e.g.: /Users/Highend/Tools/Steuer Erklärung/Jahr 2010.pdf|/Users/Highend/Tools/Steuer Erklärung/Jahr 2011.pdf

Code: Select all

$SearchForSeparator = strpos("$UnixPathWithFileName", "|");
	if($SearchForSeparator != "-1") {
		$UnixPathWithoutFileName = gettoken("$UnixPathWithFileName", 1, "|");
	} else {
		$UnixPathWithoutFileName = $UnixPathWithFileName;
	}

	$UnixPathWithoutFileName = regexreplace($UnixPathWithoutFileName, "(/.*(?=\/.*))(.*$)", "$1");
Leads to the expected result.

Gracias Stefan.

P.S.: I'm on a birthday today will be back home late, so no further replies from my side till then
One of my scripts helped you out? Please donate via Paypal

tiago
Posts: 589
Joined: 14 Feb 2011 21:41

Re: A bit of regex help needed

Post by tiago »

I have this on clipboard and I need to match the "subtitleserve" links:

Code: Select all

text
http://www.opensubtitles.org/en/subtitleserve/sub/abc
http://www.opensubtitles.net/en/movie-subtitles-searcher
http://www.opensubtitles.org/en/subtitleserve/sub/def
text
using the following code it matches them as if I change $1 to $2 the target links disappear but how do I get rid of everything else?

Code: Select all

    $src = "<clipboard>";
    $res = regexreplace($src, ".*?(http://www.opensubtitles.org/en/subtitleserve/sub/.*).*", "$1");
    msg $res;
Desired output:

Code: Select all

http://www.opensubtitles.org/en/subtitleserve/sub/abc
http://www.opensubtitles.org/en/subtitleserve/sub/def
Power-hungry user!!!

Stefan
Posts: 1360
Joined: 18 Nov 2008 21:47
Location: Europe

Re: A bit of regex help needed

Post by Stefan »

tiago wrote:I need to match the "subtitleserve" links
My solution:

Having this at the clipboard:

Code: Select all

text
http://www.opensubtitles.org/en/subtitleserve/sub/abc
http://www.opensubtitles.net/en/movie-subtitles-searcher
http://www.opensubtitles.org/en/subtitleserve/sub/def
text

Executing this code:

Code: Select all

$array="";
   foreach($line, "<clipboard>", "<crlf>"){if (strpos($line, "subtitleserve") > 0){$array = "$array$line<crlf>";} }
   text $array;

Getting this output:

Code: Select all

http://www.opensubtitles.org/en/subtitleserve/sub/abc
http://www.opensubtitles.org/en/subtitleserve/sub/def

HTH? :D

tiago
Posts: 589
Joined: 14 Feb 2011 21:41

Re: A bit of regex help needed

Post by tiago »

It won't do, Stefan, as "text" means situations like:

Code: Select all

text http://www.opensubtitles.org/en/subtitleserve/sub/abc text
http://www.opensubtitles.net/en/movie-subtitles-searcher text
http://www.opensubtitles.org/en/subtitleserve/sub/def text
text http://www.google.com
and so on. Your solution matches whole lines in which "subtitleserve" appears instead of the valid links only. Thanks for trying anyway!
Power-hungry user!!!

Stefan
Posts: 1360
Joined: 18 Nov 2008 21:47
Location: Europe

Re: A bit of regex help needed

Post by Stefan »

tiago wrote:It won't do, Stefan, as "text" means situations like:

Code: Select all

text http://www.opensubtitles.org/en/subtitleserve/sub/abc text
http://www.opensubtitles.net/en/movie-subtitles-searcher text
http://www.opensubtitles.org/en/subtitleserve/sub/def text
text http://www.google.com
and so on. Your solution matches whole lines in which "subtitleserve" appears instead of the valid links only. Thanks for trying anyway!
Ah, that was not clear to me from your first example.
OK, then it depends to what comes after "sub/".

Can you provide an always matching pattern what will come after "sub/"?
Always one word (one up to many chars, till an space)
IOW: how can an computer detect the end of such an URL?

tiago
Posts: 589
Joined: 14 Feb 2011 21:41

Re: A bit of regex help needed

Post by tiago »

Numbers.
The ending goes like
12345
or
1234567
Sometimes more sometimes less but 5 or 7 digits seems to be a constant. And 'always' having a blank space or a line break after them.
Power-hungry user!!!

Stefan
Posts: 1360
Joined: 18 Nov 2008 21:47
Location: Europe

Re: A bit of regex help needed

Post by Stefan »

tiago wrote:Numbers.
The ending goes like
12345
or
1234567
Sometimes more sometimes less but 5 or 7 digits seems to be a constant. And 'always' having a blank space or a line break after them.
:shock: :roll:
tiago wrote: Desired output:

Code: Select all

http://www.opensubtitles.org/en/subtitleserve/sub/abc
http://www.opensubtitles.org/en/subtitleserve/sub/def
'abc' is no number!


- - -



You can match an digit by using \d
One-or-more digits match with \d+

So your RE could look like:
".*?(http://www.opensubtitles.org/en/subtitleserve/sub/\d+).*";



Test

For an clipboard content of:

Code: Select all

text http://www.opensubtitles.org/en/subtitleserve/sub/2 text
text http://www.opensubtitles.org/en/subtitleserve/sub/4567 text test
http://www.opensubtitles.net/en/movie-subtitles-searcher text
http://www.opensubtitles.org/en/subtitleserve/sub/201108021057 text
text http://www.google.com
This script:

Code: Select all

  $array="";
  $REmatch = ".*?(http://www.opensubtitles.org/en/subtitleserve/sub/\d+).*";

  foreach($line, "<clipboard>", "<crlf>"){

    if (strpos($line, "subtitleserve") > 0){
        $URL = regexreplace( $line, $REmatch, "$1" );
        $array = "$array$URL<crlf>";
    }

  }
 text $array;

will provide that output:

Code: Select all

http://www.opensubtitles.org/en/subtitleserve/sub/2
http://www.opensubtitles.org/en/subtitleserve/sub/4567
http://www.opensubtitles.org/en/subtitleserve/sub/201108021057

tiago
Posts: 589
Joined: 14 Feb 2011 21:41

Re: A bit of regex help needed

Post by tiago »

Thanks much Stefan, that's it!
Power-hungry user!!!

tiago
Posts: 589
Joined: 14 Feb 2011 21:41

Re: A bit of regex help needed

Post by tiago »

1. anyone having a regex code + script to scan files after youtube links?

2.

Code: Select all

$a = "Changes for v8.1.315 by PSpad.txt"; $bn = regexreplace($a, "(.+\.).*", "$1"); echo $bn;
How do I get that regex working so it does not reports the last (extension) dot?

TIA.
Power-hungry user!!!

highend
Posts: 13274
Joined: 06 Feb 2011 00:33

Re: A bit of regex help needed

Post by highend »

1. ? What part do you need exactly?
e.g.: http://www.youtube.com/watch?v=ZuGgm8UQ ... AAAAAAAEAA
This one? ZuGgm8UQ-7E

Can be done with

Code: Select all

(?<=).+=(.+(?=&feature)).*
Explanation: (?<=).+= <- positive lookbehind. Don't include anything up to the first "=" in the search string.
(.+(?=&feature)) <- positive lookahead. Include everything up to (but not) including the first following "&feature" match.
.* <- the rest of the string ofc.

2.

Code: Select all

$bn = regexreplace($a, "(.+(?=\.)).*", "$1");
It uses a positive lookahead while the inner parenthesis do not count as a backreference (so you can still use the normal $1 replacement expression).
One of my scripts helped you out? Please donate via Paypal

tiago
Posts: 589
Joined: 14 Feb 2011 21:41

Re: A bit of regex help needed

Post by tiago »

1. No, I need to match the whole thing: http://www.youtube.com/watch?v=ZuGgm8UQ, stopping at the first "&" which in most cases is a valid delimiter. But then the question is: how to plug the regex into a script? I have found a code once and it matched perfectly... as all the addresses were wiped out in the final output message! I need the opposite ie, everything else should go and the addresses, clean for use.

2. Hey thanks! :appl: :appl: :appl:
Power-hungry user!!!

highend
Posts: 13274
Joined: 06 Feb 2011 00:33

Re: A bit of regex help needed

Post by highend »

Where exactly is your problem?

(.+?(?=&)).*

Catches everything from the url until the first "&" character (as long as the http starts at the beginning of a line).

But as always: Regexes depend on the context so if you're unsure, post a sample file.

And for gathering things from a file look at this thread: http://www.xyplorer.com/xyfc/viewtopic. ... hilit=html

Should cover the basics.
One of my scripts helped you out? Please donate via Paypal

tiago
Posts: 589
Joined: 14 Feb 2011 21:41

Re: A bit of regex help needed

Post by tiago »

Following your bearings I did:

Code: Select all

// step;
  $files = folderreport("files", "r", , , , "|");

   $re = "";
   foreach($file, "$files", "|"){
      $content = readfile("$file", "t");
         foreach($line, "$content", "<crlf>"){

   foreach($token, $line, " ") {
// step;
            $type = regexreplace("$token", "(.+?(?=&)).*", "$1");
            if($type != $token){  $re = "$re" . "$type<crlf>"; }
            elseif($type == $token){ }
                               }
         }
   }

         text $re;
But this is not seeing youtube links without the separator "&" which is expected from the code but not desired. Example: "http://www.youtube.com/watch?v=vvvvv" will not be reported. How to solve this?
Power-hungry user!!!

Post Reply