regex match if or not exist part of name

rur54 · Post by **rur54** » 19 Mar 2017 06:55

Hi, I am regex dumb for now...
Slowly learning and have looked some examples here and on regexlib.com
But it has been few days on off when I have time but I decided to ask for some help as I believe it is quite easy (when you know it)

I have combination of these filenames

A_by_B
A_by_B-C
A_by_B-C [D]

C and [D] may or may not be in every filename

I need:
A
B
[D]- with brackets if present

So far I got this but fails if C or D is not present. (edit)

Code: Select all

(.+)(?:_by_)(.*)?-(.*)?(\[.*\])

Thanks a lot.
Cheers

rur54 · Post by **rur54** » 19 Mar 2017 11:57

As I could not get this simple one-line regex to work I did it a long way...
This seams to work but I am sure there is a better way.

Not really important, but if anyone has a better solution I would love to know.
Cheers

Code: Select all

	$List = <<<>>>
doc1_by_name.doc
doc2_by_name-temp.doc
doc3_by_name [tag1 tag2].doc
doc4_by_name-temp [tag1 tag2].doc
>>>;

	// $List = get("SelectedItemsPathNames", "|");
	//foreach($item, $testList){
	foreach($item, $List, "<crlf>"){
		$filename = getpathcomponent("$item", "base");		//filename without extension

		//first check if it contains "_by_" - else skip rename
		if (strpos($filename, "_by_") != -1){
			
			$i="";
			foreach($token, $filename, "_by_") {
				$i++;
				if ($i == 1) {
					$title = $token;
				}
				elseif($i == 2) {
					If (strpos($token, "-") != -1) { // if second token contains "-" split it
						$n="";
						foreach($token2, $token, "-") {
							$n++;
							if($n == 1) {
								$name = $token2;
							}
						}
					}
					else {		//if it does not contain "-", then get token name before brackets
						$name = RegExReplace($token, "(.*)\[.*", "$1");	//get name before brackets
						$name = RegExReplace($name,"^[ \t]+|[ \t]+$",""); //Removes spaces from start and end if present.
					};
				};
			};
			
			//this was on top but I moved it to the bottom
			//Check if it contains tags in brackets "[*]" then get text with brackets.
			If ( strpos($filename, '[', -1) > -1  AND strpos($filename, ']', -1) > strpos($filename, '[', -1)) {
				$fntags = RegExReplace("$filename", "(.*)(\[.*\])(.*)", "$2");	//get tags "with" brackets "[*]"
				$newName = $name . '_' . $title . ' ' . $fntags;
			}
			else {	//else it has no tags
				$newName = $name . '_' . $title;
			};

		}
		else {	//if not contains "_by_" skip rename.
			continue;
		};
		
		echo "Before / After <crlf 2>$filename <crlf 2>$newName";
		//renameitem("$newName", "$item", , "-01");		//[Default] Smart (keep extension unless extension is passed), auto-suffix on collision.

	};

FluxTorpedoe · Post by **FluxTorpedoe** » 19 Mar 2017 21:38

Hi’

Here’s a quick and (very) dirty one-line regex which does the trick—at least based on your example.

regexreplace($List, "^([^\r]+)_by_([^\r\[-]+[^\s\r\.\[-])[^\r\[]*( \[[^\r]+\])*\.[^\r\.]+(\r\n)*", "$2_$1$3$4");

It could certainly be much cleaner, in general, and especially if items are processed one by one and without extension.
(all \r are to restrict processing to a single line, and \. to discard extension, etc.)

Code: Select all

  $List = <<<>>>
doc1_by_name.doc
doc2_by_name-temp.doc
doc3_by_name [tag1 tag2].doc
doc4_by_name-temp [tag1 tag2].doc
>>>;

  $List2 = regexreplace($List, "^([^\r]+)_by_([^\r\[-]+[^\s\r\.\[-])[^\r\[]*( \[[^\r]+\])*\.[^\r\.]+(\r\n)*", "$2_$1$3$4");

  text $List2;
  
// output:
//
// name_doc1
// name_doc2
// name_doc3 [tag1 tag2]
// name_doc4 [tag1 tag2]

Hope it’s useful,

Flux

rur54 · Post by **rur54** » 20 Mar 2017 22:05

Thank you for your time and help.
This works great on that list example but fails to work if I change the list or on actual selected files.
I tried changing things around but am just not as skilled in regex. Now I am not sure it is even possible to do regex that will do what I need.
Cheers

FluxTorpedoe · Post by **FluxTorpedoe** » 21 Mar 2017 02:11

@rur54
Indeed, depending on your real files, it might be easier to "dismantle" file names step by step like you did…

@rur54 & @all
But since I’d started, I thought I might as well use this opportunity to make a tiny Regex Tutorial, for anyone interested!
(With a cleaner regex that is more readable and, hopefully, tweakable!)

BTW, the original regex works on real files too (with the names you provided) if used in the script below, and with gpc($item, "file") instead of "base".

⮚ So here’s a practical case with a cleaner regex that processes each file one by one:

• Original files:
doc1_by_author.doc
doc2_by_author-temp.doc
doc3_by_author [tag1 tag2].doc
doc4_by_author-temp [tag1 tag2].doc

• Renamed files:
author_doc1.doc
author_doc2.doc
author_doc3 [tag1 tag2].doc
author_doc4 [tag1 tag2].doc

Code: Select all

  $List = listpane(, , ,<crlf>);

  foreach($item, $List, <crlf>) {
    $newName = gpc($item, "base");
    $newName = regexreplace($newName, "^(.+)_by_([^\[-]+[^\s\[-])[^\[]*( \[[^\]]+\])*$", "$2_$1$3");
    renameitem("$newName", "$item", , "-01");
  }

⮚ So we have:

— the “title”: everything from the start (^), until (but not including) _by_
   ^(.+)_by_

— the “author”: everything that follows but doesn’t include –(temp) or [(tags), and ends without a space (\s) to keep it for potential tags!
   ([^\[-]+[^\s\[-])

— some potential “-temp” suffix to discard: anything that starts with – but doesn’t include potential [(tags)
   [^\[]*

— some potential “tag”: anything that starts with a space, is then included in [brackets], and is at the end of the file ($)
   ( \[[^\]]+\])*$

Hope this might help someone!

Have a nice day,

Flux

rur54 · Post by **rur54** » 21 Mar 2017 03:44

Thank you for this tutorial.
I was going to ask but did not want to waste you time.
Cheers again.

XYplorer Beta Club

regex match if or not exist part of name

regex match if or not exist part of name

Re: regex match if or not exist part of name

Re: regex match if or not exist part of name

Re: regex match if or not exist part of name

Re: regex match if or not exist part of name

Re: regex match if or not exist part of name