php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #69181 csv parsing drops newlines within fields with SplFileObject::DROP_NEW_LINE
Submitted: 2015-03-04 11:44 UTC Modified: -
Votes:4
Avg. Score:3.2 ± 0.8
Reproduced:4 of 4 (100.0%)
Same Version:2 (50.0%)
Same OS:2 (50.0%)
From: greg dot bowler at g105b dot com Assigned:
Status: Open Package: Filesystem function related
PHP Version: 5.6.6 OS: Linux Ubuntu 14.04 LTS
Private report: No CVE-ID: None
Have you experienced this issue?
Rate the importance of this bug to you:

 [2015-03-04 11:44 UTC] greg dot bowler at g105b dot com
Description:
------------
When using SplFileObject to parse a CSV file, unexpected behaviour occurs.

According to the documentation, the DROP_NEW_LINE flag drops newlines at the end of a line.

http://php.net/manual/en/class.splfileobject.php#splfileobject.constants.drop-new-line

Tests below show that not only the new-lines at the end of the file are dropped when using the flag, but the first new line in any field is also dropped.

For example, the following raw csv:

col1,col2
testValue,one\ntwo\nthree

is interpreted like this when using DROP_NEW_LINES flag:

col1,col2
testValue,onetwo\nthree


Test script:
---------------
// Create a new CSV file using good old fputcsv:
$csvLines = [
	["Name","Notes"],
	["Person 1","This row has no new line"],
	["Person 2","This row has\nthree\nnew\nlines"],
];
$filePath = "/tmp/data.csv";
$fp = fopen($filePath, "w");

foreach($csvLines as $line) {
	fputcsv($fp, $line);
}

fclose($fp);



// Iteration 1, using DROP_NEW_LINE:

$file = new SplFileObject($filePath, "r");
$file->setFlags(
	SplFileObject::READ_CSV |
	SplFileObject::READ_AHEAD |
	SplFileObject::SKIP_EMPTY |
	SplFileObject::DROP_NEW_LINE
);

foreach($file as $i => $row) {
	printf("%d \\n in %d ... " . PHP_EOL, substr_count($row[1], "\n"), $i);;
}

echo PHP_EOL . PHP_EOL;


// Iteration 2, with no DROP_NEW_LINE flag

$file = new SplFileObject($filePath, "r");
$file->setFlags(
	SplFileObject::READ_CSV |
	SplFileObject::READ_AHEAD |
	SplFileObject::SKIP_EMPTY
);

foreach($file as $i => $row) {
	printf("%d \\n in %d ... " . PHP_EOL, substr_count($row[1], "\n"), $i);;
}

Expected result:
----------------
Expected result is that the count of newlines in each iteration is the same.

There are no newlines at the end of the lines, so there should be none dropped.

Actual result:
--------------
Output of above test script:

0 \n in 0 ... 
0 \n in 1 ... 
2 \n in 2 ... 

0 \n in 0 ... 
0 \n in 1 ... 
3 \n in 2 ... 

Actual result is that the count of newlines is always 1 less than expected. The first \n character within any field is stripped.

Patches

Add a Patch

Pull Requests

Pull requests:

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2015-03-04 11:47 UTC] greg dot bowler at g105b dot com
I am looking in ext/standard/file.c for where this behaviour could be changed, as I would like to contribute a patch myself, but can't figure out where the flags are being read.

Can someone point me in the right direction, and I will look at submitting a patch?
 [2015-03-04 12:56 UTC] greg dot bowler at g105b dot com
The culprit of this bug seems to be the assumptive use of `strcspn` on line 2055 of spl_directory.c

http://lxr.php.net/xref/PHP_TRUNK/ext/spl/spl_directory.c#2055

size_t strcspn ( const char * str1, const char * str2 );

Scans str1 for the first occurrence of any of the characters that are part of str2, returning the number of characters of str1 read before this first occurrence.

The key here is the word "any". It will look return the length up to the first \n in the example above.
 
PHP Copyright © 2001-2018 The PHP Group
All rights reserved.
Last updated: Mon Oct 22 03:01:25 2018 UTC