Basic SC question: How best read a text file line by line?

Please check the FAQ (https://www.xyplorer.com/faq.php) before posting a question...
Post Reply
autocart
Posts: 1384
Joined: 26 Sep 2013 15:22

Basic SC question: How best read a text file line by line?

Post by autocart »

I can't find the confirmation for my assumption:

I want to read a file and process its content line by line.
What I came up with:

Code: Select all

$fileContent = readfile("[path]");
  foreach($line, $fileContent, "<crlf>", "e") //"e" to skip empty lines
  {
    //do something with $line
  }
Ist this the best way of how to do it?
E.g.is <crlf> one character or 2? The help file looks like it was one ("<crlf> Carriage Return Line Feed (0x0D0A)").
But what if the line "seperators" are made up of only "cr" character or only "lf" character? Or if <crlf> is one char what if the file contains cr + lf as 2 chars on each line?

Or maybe there is a faster way for large files?? I don't know, I am asking.

Besides, are there any other pitfalls that one could fall into with the code I posted? (Don't take this last question too literally. Read between the words, please. :D )

EDIT: Related, that I have looked at, e.g.: How to read UNICODE file content with readfile()?

highend
Posts: 14940
Joined: 06 Feb 2011 00:33
Location: Win Server 2022 @100%

Re: Basic SC question: How best read a text file line by lin

Post by highend »

It depends on what should be done on each line. Huge file with many lines -> A foreach loop isn't by default fast...

Regarding <crlf>: Haven't tried different line endings (unix, macos) yet. If in doubt, do a simple regex replace once before the loop.
"\r?\n" is the correct term for all available line endings
One of my scripts helped you out? Please donate via Paypal

bdeshi
Posts: 4256
Joined: 12 Mar 2014 17:27
Location: Asteroid B-612
Contact:

Re: Basic SC question: How best read a text file line by lin

Post by bdeshi »

<crlf> as used by XYplorer is always the two characters CR+LF.
Icon Names | Onyx | Undocumented Commands | xypcre
[ this user is asleep ]

autocart
Posts: 1384
Joined: 26 Sep 2013 15:22

Re: Basic SC question: How best read a text file line by lin

Post by autocart »

Thx very much, highend, for this quick info.
Also thx to u, Sammay.

Since it is not the fastest, do u know of another way inside XY that would be faster? Thx.

bdeshi
Posts: 4256
Joined: 12 Mar 2014 17:27
Location: Asteroid B-612
Contact:

Re: Basic SC question: How best read a text file line by lin

Post by bdeshi »

Here's a Windows+Unix compatible line parser I came up with.

Code: Select all

 $data = readfile('file.txt');
 // show line count
 $lineCount = gettoken(regexmatches($data, '\r?\n'),'count','|');
 echo "found " . $lineCount . " lines";
 
 // normalize into <crlf> linebreak.
 // XY's regex functions search be line and apparently use
 // \r?\n to find line end, hence finds both Windows and Unix lines
 $data = regexmatches($data, '^.*$', <crlf>);
 foreach ($line, $data, <crlf>) {
  text $line;
 }
edit. this doesn't address the speed issue with foreach loops that highend brought up. However, the words "huge" and "speed" may be relatively small in this case. An 875kb/1773lines text file was parsed in 55006 milliseconds 55006# microseconds 485056 microseconds, or about (almost exactly) half a second.
Icon Names | Onyx | Undocumented Commands | xypcre
[ this user is asleep ]

highend
Posts: 14940
Joined: 06 Feb 2011 00:33
Location: Win Server 2022 @100%

Re: Basic SC question: How best read a text file line by lin

Post by highend »

Never tried if a while loop would be faster than a foreach loop. From my internal test about speed of a foreach loop... On a 3,4 GHz i5 quad-core a foreach loop runs about 1k times per second. Without doing anything in the loop itself. Processing real data in each line with multiple commands... -> ... So if possible I'd always try to use things like regexreplace / match, formatlist, etc. but it all depends on what needs to be done.
One of my scripts helped you out? Please donate via Paypal

RalphM
Posts: 2089
Joined: 27 Jan 2005 23:38
Location: Cairns, Australia

Re: Basic SC question: How best read a text file line by lin

Post by RalphM »

SammaySarkar wrote:...An 875kb/1773lines text file was parsed in 55006 milliseconds, or about (almost exactly) half a second.
Sorry to say Sammay but something is not quite right with this calculation?!
Ralph :)
(OS: W11 25H2 Home x64 - XY: Current x64 beta - Office 2024 64-bit - Display: 1920x1080 @ 125%)

bdeshi
Posts: 4256
Joined: 12 Mar 2014 17:27
Location: Asteroid B-612
Contact:

Re: Basic SC question: How best read a text file line by lin

Post by bdeshi »

What? Where did you get that? :twisted:
Kidding. I did some weird conversion in my head which didn't properly transcribe in the text there (and now() was short by one 'f'). Replaced with another test result. (also, apparently my hdd read speed is faster before sundown.)
Icon Names | Onyx | Undocumented Commands | xypcre
[ this user is asleep ]

autocart
Posts: 1384
Joined: 26 Sep 2013 15:22

Re: Basic SC question: How best read a text file line by lin

Post by autocart »

SammaySarkar wrote:Here's a Windows+Unix compatible line parser I came up with.

Code: Select all

$data = readfile('file.txt');
 // show line count
  $lineCount = gettoken(regexmatches($data, '\r?\n'),'count','|');
  echo "found " . $lineCount . " lines";
 
 // normalize into <crlf> linebreak.
 // XY's regex functions search be line and apparently use
 // \r?\n to find line end, hence finds both Windows and Unix lines
  $data = regexmatches($data, '^.*$', <crlf>); // <<<<<<------EXCHANGE THIS LINE!!!!!!!!!!!!!!!!!!!
  foreach ($line, $data, <crlf>) {
   text $line;
  }
Thx, Sammay, but ur code works for me only in the case of Unix file format.
The only line that I found that workes for me in all formats (Unix, Windows and Mac) is (instead of the marked line):

Code: Select all

  $data = regexmatches($data, '[^\r\n]*', <crlf>);
For some reason it does create unexpected empty lines inbetween but with the flag "e" in foreach this is no real problem (maybe consuming extra time ?, but at least it works).

BTW, the line count code did also not work for me but I don't need it.

highend
Posts: 14940
Joined: 06 Feb 2011 00:33
Location: Win Server 2022 @100%

Re: Basic SC question: How best read a text file line by lin

Post by highend »

See my first and second post

Code: Select all

$fileContent = formatlist(regexreplace(readfile("<full path>"), "\r?\n", "<crlf>"), "e", <crlf>);
That's doing everything that's necessary and you don't need the "e" param in the loop
One of my scripts helped you out? Please donate via Paypal

autocart
Posts: 1384
Joined: 26 Sep 2013 15:22

Re: Basic SC question: How best read a text file line by lin

Post by autocart »

highend wrote:See my first and second post

Code: Select all

$fileContent = formatlist(regexreplace(readfile("<full path>"), "\r?\n", "<crlf>"), "e", <crlf>);
That's doing everything that's necessary and you don't need the "e" param in the loop
Thx, highend, I did not know how to process ur first 2 msgs.
The line
regexreplace(readfile("<full path>"), "\r?\n", "<crlf>");
works for both Unix and Windows text file formats but not for Mac (in my tests using PSPad and Notepad++), since Mac has only "\r" at the end of each line.
Still, thx for the hints and brainstorming. I did find a working solution and formatlist shall be useful.

highend
Posts: 14940
Joined: 06 Feb 2011 00:33
Location: Win Server 2022 @100%

Re: Basic SC question: How best read a text file line by lin

Post by highend »

Code: Select all

(\r\n|\r|\n)
or the shorter

Code: Select all

(\r?\n|\r)
works for Windows, Unix, Mac
One of my scripts helped you out? Please donate via Paypal

Post Reply