php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #40501 fgetcsv can't handle trailing odd number of backslashes
Submitted: 2007-02-15 20:11 UTC Modified: 2007-10-03 10:46 UTC
Votes:8
Avg. Score:5.0 ± 0.0
Reproduced:8 of 8 (100.0%)
Same Version:4 (50.0%)
Same OS:2 (25.0%)
From: mike at opendns dot com Assigned: dsp (profile)
Status: Closed Package: Filesystem function related
PHP Version: 5.2.1 OS: Linux, debian sarge
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: mike at opendns dot com
New email:
PHP Version: OS:

 

 [2007-02-15 20:11 UTC] mike at opendns dot com
Description:
------------
If an element in a CSV file ends with an odd number of trailing backslashes, it'll miss the enclosure character.

This isn't a documentation problem -
http://www.rfc-editor.org/rfc/rfc4180.txt
Backslashes are not escape characters in CSV.

This was part of bug #39538.  The other half of that bug was correctly fixed.  This is still broken.

Reproduce code:
---------------
mikef@dev5:/tmp# cat -A csv.tmp
"this element contains the delimiter, and ends with an odd number of backslashes (ex: 1)\",and it isn't the last element$

mikef@dev5:/tmp# cat test_csv.php
<?php
$file = '/tmp/csv.tmp';

$h = fopen($file, 'r');
$data = fgetcsv($h);
fclose($h);

var_dump($data);
?>

Expected result:
----------------
mikef@dev5:/tmp# php test_csv.php
array(2) {
  [0]=>
  string(88) "this element contains the delimiter, and ends with an odd number of backslashes (ex: 1)\"
  [1]=>
  string(29) "and it isn't the last element"
}

Actual result:
--------------
mikef@dev5:/tmp# php test_csv.php
array(1) {
  [0]=>
  string(120) "this element contains the delimiter, and ends with an odd number of backslashes (ex: 1)\",and it isn't the last element"
}

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2007-04-18 09:00 UTC] frapa02 at hotmail dot com
I am experiencing this issue which is preventing me from processing standard CSV file fields which have a backslash as the last character before the double quote enclosure. e.g.

"field 1","field 2","field 3\",field 4"

This will result in fgetcsv only returning 3 fields instead of 4.

Perhaps the best way to fix this is to remove the test for 'escape_char' in the 'php_fgetcsv' function. If backward compatibility is an issue, an additional parm to fgetcsv could be added to enable the use of the escape character. It's default should be to use no escape character.
 [2007-04-18 09:29 UTC] frapa02 at hotmail dot com
Further thoughts on this...

The delimiter logic needs to be replaced according to the double quote character rules (modified) in the above documentation (rfc4180.txt), i.e. as follows:

*****************************
7.  If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote.  For example:

       "aaa","b""bb","ccc"
*****************************

However, fgetcsv needs to replace the above rule for double quote with a test for the character supplied in the enclosure parameter. In other words, the delimiter character is always the enclosure character itself, but it can only delimit the same character.

Please email me if help is needed with analysis or testing.
 [2007-10-03 10:46 UTC] dsp@php.net
This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.

In PHP 5.3 there will be an additional escape character parameter. Setting this parameter and the enclosure parameter to " will cause your expected result.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Dec 26 23:01:28 2024 UTC