Page 1 of 1

Basic SC question: How best read a text file line by line?

Posted: 07 Sep 2016 15:49
by autocart
I can't find the confirmation for my assumption:

I want to read a file and process its content line by line.
What I came up with:

Code: Select all

$fileContent = readfile("[path]");
  foreach($line, $fileContent, "<crlf>", "e") //"e" to skip empty lines
  {
    //do something with $line
  }
Ist this the best way of how to do it?
E.g.is <crlf> one character or 2? The help file looks like it was one ("<crlf> Carriage Return Line Feed (0x0D0A)").
But what if the line "seperators" are made up of only "cr" character or only "lf" character? Or if <crlf> is one char what if the file contains cr + lf as 2 chars on each line?

Or maybe there is a faster way for large files?? I don't know, I am asking.

Besides, are there any other pitfalls that one could fall into with the code I posted? (Don't take this last question too literally. Read between the words, please. :D )

EDIT: Related, that I have looked at, e.g.: How to read UNICODE file content with readfile()?

Re: Basic SC question: How best read a text file line by lin

Posted: 07 Sep 2016 15:57
by highend
It depends on what should be done on each line. Huge file with many lines -> A foreach loop isn't by default fast...

Regarding <crlf>: Haven't tried different line endings (unix, macos) yet. If in doubt, do a simple regex replace once before the loop.
"\r?\n" is the correct term for all available line endings

Re: Basic SC question: How best read a text file line by lin

Posted: 07 Sep 2016 16:00
by bdeshi
<crlf> as used by XYplorer is always the two characters CR+LF.

Re: Basic SC question: How best read a text file line by lin

Posted: 07 Sep 2016 16:02
by autocart
Thx very much, highend, for this quick info.
Also thx to u, Sammay.

Since it is not the fastest, do u know of another way inside XY that would be faster? Thx.

Re: Basic SC question: How best read a text file line by lin

Posted: 07 Sep 2016 16:13
by bdeshi
Here's a Windows+Unix compatible line parser I came up with.

Code: Select all

 $data = readfile('file.txt');
 // show line count
 $lineCount = gettoken(regexmatches($data, '\r?\n'),'count','|');
 echo "found " . $lineCount . " lines";
 
 // normalize into <crlf> linebreak.
 // XY's regex functions search be line and apparently use
 // \r?\n to find line end, hence finds both Windows and Unix lines
 $data = regexmatches($data, '^.*$', <crlf>);
 foreach ($line, $data, <crlf>) {
  text $line;
 }
edit. this doesn't address the speed issue with foreach loops that highend brought up. However, the words "huge" and "speed" may be relatively small in this case. An 875kb/1773lines text file was parsed in 55006 milliseconds 55006# microseconds 485056 microseconds, or about (almost exactly) half a second.

Re: Basic SC question: How best read a text file line by lin

Posted: 07 Sep 2016 16:14
by highend
Never tried if a while loop would be faster than a foreach loop. From my internal test about speed of a foreach loop... On a 3,4 GHz i5 quad-core a foreach loop runs about 1k times per second. Without doing anything in the loop itself. Processing real data in each line with multiple commands... -> ... So if possible I'd always try to use things like regexreplace / match, formatlist, etc. but it all depends on what needs to be done.

Re: Basic SC question: How best read a text file line by lin

Posted: 08 Sep 2016 06:00
by RalphM
SammaySarkar wrote:...An 875kb/1773lines text file was parsed in 55006 milliseconds, or about (almost exactly) half a second.
Sorry to say Sammay but something is not quite right with this calculation?!

Re: Basic SC question: How best read a text file line by lin

Posted: 08 Sep 2016 13:51
by bdeshi
What? Where did you get that? :twisted:
Kidding. I did some weird conversion in my head which didn't properly transcribe in the text there (and now() was short by one 'f'). Replaced with another test result. (also, apparently my hdd read speed is faster before sundown.)

Re: Basic SC question: How best read a text file line by lin

Posted: 13 Sep 2016 10:29
by autocart
SammaySarkar wrote:Here's a Windows+Unix compatible line parser I came up with.

Code: Select all

$data = readfile('file.txt');
 // show line count
  $lineCount = gettoken(regexmatches($data, '\r?\n'),'count','|');
  echo "found " . $lineCount . " lines";
 
 // normalize into <crlf> linebreak.
 // XY's regex functions search be line and apparently use
 // \r?\n to find line end, hence finds both Windows and Unix lines
  $data = regexmatches($data, '^.*$', <crlf>); // <<<<<<------EXCHANGE THIS LINE!!!!!!!!!!!!!!!!!!!
  foreach ($line, $data, <crlf>) {
   text $line;
  }
Thx, Sammay, but ur code works for me only in the case of Unix file format.
The only line that I found that workes for me in all formats (Unix, Windows and Mac) is (instead of the marked line):

Code: Select all

  $data = regexmatches($data, '[^\r\n]*', <crlf>);
For some reason it does create unexpected empty lines inbetween but with the flag "e" in foreach this is no real problem (maybe consuming extra time ?, but at least it works).

BTW, the line count code did also not work for me but I don't need it.

Re: Basic SC question: How best read a text file line by lin

Posted: 13 Sep 2016 10:49
by highend
See my first and second post

Code: Select all

$fileContent = formatlist(regexreplace(readfile("<full path>"), "\r?\n", "<crlf>"), "e", <crlf>);
That's doing everything that's necessary and you don't need the "e" param in the loop

Re: Basic SC question: How best read a text file line by lin

Posted: 13 Sep 2016 11:33
by autocart
highend wrote:See my first and second post

Code: Select all

$fileContent = formatlist(regexreplace(readfile("<full path>"), "\r?\n", "<crlf>"), "e", <crlf>);
That's doing everything that's necessary and you don't need the "e" param in the loop
Thx, highend, I did not know how to process ur first 2 msgs.
The line
regexreplace(readfile("<full path>"), "\r?\n", "<crlf>");
works for both Unix and Windows text file formats but not for Mac (in my tests using PSPad and Notepad++), since Mac has only "\r" at the end of each line.
Still, thx for the hints and brainstorming. I did find a working solution and formatlist shall be useful.

Re: Basic SC question: How best read a text file line by lin

Posted: 13 Sep 2016 12:08
by highend

Code: Select all

(\r\n|\r|\n)
or the shorter

Code: Select all

(\r?\n|\r)
works for Windows, Unix, Mac