php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #69181 READ_CSV|DROP_NEW_LINE drops newlines within fields
Submitted: 2015-03-04 11:44 UTC Modified: 2021-11-16 14:02 UTC
Votes:5
Avg. Score:3.4 ± 0.8
Reproduced:5 of 5 (100.0%)
Same Version:2 (40.0%)
Same OS:3 (60.0%)
From: greg dot bowler at g105b dot com Assigned:
Status: Closed Package: SPL related
PHP Version: 5.6.6 OS: Linux Ubuntu 14.04 LTS
Private report: No CVE-ID: None
 [2015-03-04 11:44 UTC] greg dot bowler at g105b dot com
Description:
------------
When using SplFileObject to parse a CSV file, unexpected behaviour occurs.

According to the documentation, the DROP_NEW_LINE flag drops newlines at the end of a line.

http://php.net/manual/en/class.splfileobject.php#splfileobject.constants.drop-new-line

Tests below show that not only the new-lines at the end of the file are dropped when using the flag, but the first new line in any field is also dropped.

For example, the following raw csv:

col1,col2
testValue,one\ntwo\nthree

is interpreted like this when using DROP_NEW_LINES flag:

col1,col2
testValue,onetwo\nthree


Test script:
---------------
// Create a new CSV file using good old fputcsv:
$csvLines = [
	["Name","Notes"],
	["Person 1","This row has no new line"],
	["Person 2","This row has\nthree\nnew\nlines"],
];
$filePath = "/tmp/data.csv";
$fp = fopen($filePath, "w");

foreach($csvLines as $line) {
	fputcsv($fp, $line);
}

fclose($fp);



// Iteration 1, using DROP_NEW_LINE:

$file = new SplFileObject($filePath, "r");
$file->setFlags(
	SplFileObject::READ_CSV |
	SplFileObject::READ_AHEAD |
	SplFileObject::SKIP_EMPTY |
	SplFileObject::DROP_NEW_LINE
);

foreach($file as $i => $row) {
	printf("%d \\n in %d ... " . PHP_EOL, substr_count($row[1], "\n"), $i);;
}

echo PHP_EOL . PHP_EOL;


// Iteration 2, with no DROP_NEW_LINE flag

$file = new SplFileObject($filePath, "r");
$file->setFlags(
	SplFileObject::READ_CSV |
	SplFileObject::READ_AHEAD |
	SplFileObject::SKIP_EMPTY
);

foreach($file as $i => $row) {
	printf("%d \\n in %d ... " . PHP_EOL, substr_count($row[1], "\n"), $i);;
}

Expected result:
----------------
Expected result is that the count of newlines in each iteration is the same.

There are no newlines at the end of the lines, so there should be none dropped.

Actual result:
--------------
Output of above test script:

0 \n in 0 ... 
0 \n in 1 ... 
2 \n in 2 ... 

0 \n in 0 ... 
0 \n in 1 ... 
3 \n in 2 ... 

Actual result is that the count of newlines is always 1 less than expected. The first \n character within any field is stripped.

Patches

Add a Patch

Pull Requests

Pull requests:

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2015-03-04 11:47 UTC] greg dot bowler at g105b dot com
I am looking in ext/standard/file.c for where this behaviour could be changed, as I would like to contribute a patch myself, but can't figure out where the flags are being read.

Can someone point me in the right direction, and I will look at submitting a patch?
 [2015-03-04 12:56 UTC] greg dot bowler at g105b dot com
The culprit of this bug seems to be the assumptive use of `strcspn` on line 2055 of spl_directory.c

http://lxr.php.net/xref/PHP_TRUNK/ext/spl/spl_directory.c#2055

size_t strcspn ( const char * str1, const char * str2 );

Scans str1 for the first occurrence of any of the characters that are part of str2, returning the number of characters of str1 read before this first occurrence.

The key here is the word "any". It will look return the length up to the first \n in the example above.
 [2019-07-03 14:59 UTC] stof at notk dot org
Any chance to get a fix for this ? I faced this bug today and it still affects current PHP versions.
See https://3v4l.org/MbRUS for the script I made.
 [2021-10-28 12:11 UTC] cmb@php.net
-Status: Open +Status: Verified -Assigned To: +Assigned To: cmb
 [2021-10-28 12:11 UTC] cmb@php.net
The following pull request has been associated:

Patch Name: Fix #69181: READ_CSV|DROP_NEW_LINE drops newlines within fields
On GitHub:  https://github.com/php/php-src/pull/7618
Patch:      https://github.com/php/php-src/pull/7618.patch
 [2021-10-28 12:12 UTC] cmb@php.net
-Summary: csv parsing drops newlines within fields with SplFileObject::DROP_NEW_LINE +Summary: READ_CSV|DROP_NEW_LINE drops newlines within fields -Package: Filesystem function related +Package: SPL related
 [2021-11-16 14:02 UTC] cmb@php.net
-Assigned To: cmb +Assigned To:
 [2022-07-26 16:34 UTC] git@php.net
Automatic comment on behalf of cmb69
Revision: https://github.com/php/php-src/commit/5d52d472ef17faccd11b0236ac92d5b6037a647a
Log: Fix #69181: READ_CSV|DROP_NEW_LINE drops newlines within fields
 [2022-07-26 16:34 UTC] git@php.net
-Status: Verified +Status: Closed
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Apr 19 18:01:28 2024 UTC