php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #69181 csv parsing drops newlines within fields with SplFileObject::DROP_NEW_LINE
Submitted: 2015-03-04 11:44 UTC Modified: -
Votes:5
Avg. Score:3.4 ± 0.8
Reproduced:5 of 5 (100.0%)
Same Version:2 (40.0%)
Same OS:3 (60.0%)
From: greg dot bowler at g105b dot com Assigned:
Status: Open Package: Filesystem function related
PHP Version: 5.6.6 OS: Linux Ubuntu 14.04 LTS
Private report: No CVE-ID: None
View Add Comment Developer Edit
Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know!
Just going to say 'Me too!'? Don't clutter the database with that please — but make sure to vote on the bug!
Your email address:
MUST BE VALID
Solve the problem:
20 - 6 = ?
Subscribe to this entry?

 
 [2015-03-04 11:44 UTC] greg dot bowler at g105b dot com
Description:
------------
When using SplFileObject to parse a CSV file, unexpected behaviour occurs.

According to the documentation, the DROP_NEW_LINE flag drops newlines at the end of a line.

http://php.net/manual/en/class.splfileobject.php#splfileobject.constants.drop-new-line

Tests below show that not only the new-lines at the end of the file are dropped when using the flag, but the first new line in any field is also dropped.

For example, the following raw csv:

col1,col2
testValue,one\ntwo\nthree

is interpreted like this when using DROP_NEW_LINES flag:

col1,col2
testValue,onetwo\nthree


Test script:
---------------
// Create a new CSV file using good old fputcsv:
$csvLines = [
	["Name","Notes"],
	["Person 1","This row has no new line"],
	["Person 2","This row has\nthree\nnew\nlines"],
];
$filePath = "/tmp/data.csv";
$fp = fopen($filePath, "w");

foreach($csvLines as $line) {
	fputcsv($fp, $line);
}

fclose($fp);



// Iteration 1, using DROP_NEW_LINE:

$file = new SplFileObject($filePath, "r");
$file->setFlags(
	SplFileObject::READ_CSV |
	SplFileObject::READ_AHEAD |
	SplFileObject::SKIP_EMPTY |
	SplFileObject::DROP_NEW_LINE
);

foreach($file as $i => $row) {
	printf("%d \\n in %d ... " . PHP_EOL, substr_count($row[1], "\n"), $i);;
}

echo PHP_EOL . PHP_EOL;


// Iteration 2, with no DROP_NEW_LINE flag

$file = new SplFileObject($filePath, "r");
$file->setFlags(
	SplFileObject::READ_CSV |
	SplFileObject::READ_AHEAD |
	SplFileObject::SKIP_EMPTY
);

foreach($file as $i => $row) {
	printf("%d \\n in %d ... " . PHP_EOL, substr_count($row[1], "\n"), $i);;
}

Expected result:
----------------
Expected result is that the count of newlines in each iteration is the same.

There are no newlines at the end of the lines, so there should be none dropped.

Actual result:
--------------
Output of above test script:

0 \n in 0 ... 
0 \n in 1 ... 
2 \n in 2 ... 

0 \n in 0 ... 
0 \n in 1 ... 
3 \n in 2 ... 

Actual result is that the count of newlines is always 1 less than expected. The first \n character within any field is stripped.

Patches

Add a Patch

Pull Requests

Pull requests:

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2015-03-04 11:47 UTC] greg dot bowler at g105b dot com
I am looking in ext/standard/file.c for where this behaviour could be changed, as I would like to contribute a patch myself, but can't figure out where the flags are being read.

Can someone point me in the right direction, and I will look at submitting a patch?
 [2015-03-04 12:56 UTC] greg dot bowler at g105b dot com
The culprit of this bug seems to be the assumptive use of `strcspn` on line 2055 of spl_directory.c

http://lxr.php.net/xref/PHP_TRUNK/ext/spl/spl_directory.c#2055

size_t strcspn ( const char * str1, const char * str2 );

Scans str1 for the first occurrence of any of the characters that are part of str2, returning the number of characters of str1 read before this first occurrence.

The key here is the word "any". It will look return the length up to the first \n in the example above.
 [2019-07-03 14:59 UTC] stof at notk dot org
Any chance to get a fix for this ? I faced this bug today and it still affects current PHP versions.
See https://3v4l.org/MbRUS for the script I made.
 
PHP Copyright © 2001-2020 The PHP Group
All rights reserved.
Last updated: Thu Feb 27 09:01:26 2020 UTC