Extract Paths.

Discuss and share scripts and script files...
Post Reply
tiago
Posts: 589
Joined: 14 Feb 2011 21:41

Extract Paths.

Post by tiago »

Hello people.
I'm trying to readfile find pathnames for files, strip files out of paths, paths out of pathnames, base drives only out of pathnames, first parent folders out of pathnames so I can get reports out of documents readfile is able to reach.

Tried some patterns, built myself a few but being not familiar to regex this is leading to nowhere.
Maybe the community can help me on this.
First I need a pattern to find paths inside a file (no problem with that, except not having the regex to strip the info out):
C:\Documents and Settings\Administrador\Meus documentos\Contas a receber.docx
D:\Documentos\Admin\Favoritos\Tarefas a cumprir.rtf
Depending on the scanned document, this can be found like:
(name of the computer)C:\ $ (3 non-characters or something like that) \\(name of the computer)\Documents and Settings\Administrador\Meus documentos\Contas a receber.docx
(name of the computer)D:\ $ (3 non-characters or something like that) \\(name of the computer)\Documentos\Admin\Favoritos\Tarefas a cumprir.rtf
I considered hard coding computer name and the $(chars) part, but in real world that failed.

That done, I need:
Files out of paths: Contas a receber.docx, Tarefas a cumprir.rtf
Paths out of pathnames: Documents and Settings\Administrador\Meus documentos\, Documentos\Admin\Favoritos\
Base drives: C:\, D:\
First parent folders: Meus documentos, Favoritos

If you prefer to deliver something like $fixedPattern, fine, I can replace the thing assuming the rest of the script will do the "reverse match", I mean: when I get something that works via regexreplace, it replaces matching pattern and not everything else that is not a match, which is expected but not desired if you know what I mean.

Suggestion: XY could support something like regexFind (regexInvert) - gets matches of regex (everything else other than matches of regEx), for us, regex-disabled people, then it would be easy to get a working pattern out there or build ourselves test patterns and see how that works immediately.
(Man I'm getting the hang out of scripting in XY, it's awesome!)

ah, a special thank you to Stefan whose help was invaluable in another thread regarding length of strings.
Power-hungry user!!!

Stefan
Posts: 1360
Joined: 18 Nov 2008 21:47
Location: Europe

Re: Extract Paths. Split path file base extension array string

Post by Stefan »

I didn't read all of this unstructured post :wink: , but i guess you want something like an array:

I think i would do it this way

Code: Select all

//set array  to test string:
$ARRAY = "C:\Documents and Settings\Administrador\Meus documentos\XYplorer\Contas a receber.docx.bak";
   
  $DELIM = "\";     //use to split the string at
   
  //get top amount of delimiters:
  $UBound=1; While(1){ $T = gettoken( $ARRAY, $UBound, $DELIM); IF ($T==""){$UBound--;break;} $UBound++;}
   
  //split the string into parts
  //assign array elements to $Vars:
  $DRIVE             = gettoken( $ARRAY,           1, "\");
  $TopFolder         = gettoken( $ARRAY,           2, "\");
  $SubFolder         = gettoken( $ARRAY,           3, "\");
  $GrantParentFolder = gettoken( $ARRAY, $UBound - 2, "\");
  $ParentFolder      = gettoken( $ARRAY, $UBound - 1, "\");
  $FILE              = gettoken( $ARRAY, $UBound    , "\");
   
   //split FileName into base and extension:
   $Base = "not an file";
   $Exte = "not an file",
   If ($FILE != "")
    {
     If (strpos($FILE, ".") > 0)
     {
      while($Index < strlen($FILE))
       {
         $check = strpos( $FILE, ".", strlen($FILE) - $Index );
         If ($check > -1){$InStrRev = $check +1; break;} 
         $Index++;
       }
     $Base = substr($FILE, 0, $InStrRev -1);
     $Exte = substr($FILE, $InStrRev);
     }
    }
   
   
  //test output:
  text Drive: <tab 2> $DRIVE<crlf>
       TopFolder: <tab> $TopFolder<crlf>
       GrantParent: <tab> $GrantParentFolder<crlf>
       ParentFolder: <tab> $ParentFolder<crlf>
       File: <tab 2> $FILE<crlf>
       Base: <tab 2> $Base<crlf>
       Ext: <tab 2> $Exte;
Output for "C:\Documents and Settings\Administrador\Meus documentos\XYplorer\Contas a receber.docx.bak"

Code: Select all

Drive: 		 C:
TopFolder: 	 Documents and Settings
GrantParent: 	 Meus documentos
ParentFolder: 	 XYplorer
File: 		 Contas a receber.docx.bak
Base: 		 Contas a receber.docx
Ext: 		 bak
HTH?

tiago
Posts: 589
Joined: 14 Feb 2011 21:41

Re: Extract Paths.

Post by tiago »

Scanning several documents at once and getting the $ARRAYs to work with your script is the remaining, crucial problem, Stefan.

I'm trying to use readfile to accomplish the task of finding only valid paths like the aforementioned [your $ARRAY for example] and setting them up for me so I can use your provided script then finally doing my reports.

But congrats for what you've done.
Power-hungry user!!!

tiago
Posts: 589
Joined: 14 Feb 2011 21:41

Re: Extract Paths.

Post by tiago »

gettoken to extract info. Smart and reliable, it seems. Hats off.
Power-hungry user!!!

Stefan
Posts: 1360
Joined: 18 Nov 2008 21:47
Location: Europe

Re: Extract Paths.

Post by Stefan »

Not what you want?
I didn't get you.
Will come back and read your post when i am less tired ;-)
But tomorrow i am out of order :-(

tiago
Posts: 589
Joined: 14 Feb 2011 21:41

Re: Extract Paths.

Post by tiago »

No prob.
Trying to summarize, this is the idea:
>> readfile to get the $ARRAYs out of text files, some paths will be like (name of the computer)DRIVE:\ $ (3 non-characters or something like that) \\(name of the computer)\Folder\Subfolder with space on name\File name.txt; it seems that the only solution will be a regex

>>your script to strip each drive, grandpath, main folder (the folder containing a file), file, and so on (done).

>>my final script to present all of that on a cool report (partially done, it seems that will work with your solution).
Power-hungry user!!!

tiago
Posts: 589
Joined: 14 Feb 2011 21:41

Re: Extract Paths.

Post by tiago »

Any solution to extract this out of an ordinary .txt file?

Code: Select all

(lots of extra stuff before this)   `         ¹ ±    PCLocal D:\ $                \\PCLocal\D Install\Winamp\setup.exe  . . \ s e t u p . e x e (lots of extra stuff after this)
so the working path is the output D:\Install\Winamp\setup.exe
A regex to find and combine the first occurence of "X:\" and an entire path "folder\(subfolders)\anything.extension" in case may be enough. TIA.
Power-hungry user!!!

Post Reply