php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #55200 str_getcsv parses lines incorrectly
Submitted: 2011-07-13 09:10 UTC Modified: 2015-05-19 14:47 UTC
Votes:2
Avg. Score:4.0 ± 1.0
Reproduced:2 of 2 (100.0%)
Same Version:1 (50.0%)
Same OS:1 (50.0%)
From: dmitry dot dulepov at gmail dot com Assigned:
Status: Verified Package: Strings related
PHP Version: 5.6.9 OS:
Private report: No CVE-ID: None
 [2011-07-13 09:10 UTC] dmitry dot dulepov at gmail dot com
Description:
------------
Putting a space around the separator *and* using quotes around fields adds 
spaces to the field. The following line:
"123" , "456"
should produce:
"123" and "456"
but it makes:
"123 " and " 456".

In the RFC4180 the specification suggests that if the field contains quotes, 
only the text inside quotes is the content of the field. Here is the formal 
gramma:

record = field *(COMMA field)
field = (escaped / non-escaped)
escaped = DQUOTE *(TEXTDATA / COMMA / CR / LF / 2DQUOTE) DQUOTE
non-escaped = *TEXTDATA

Thus spaces should appear in field only if the field is not quoted.

Test script:
---------------
print_r(str_getcsv('"123"  , "456" ', ',', '"'))

Expected result:
----------------
array(
  0 => "123",
  1 => "456",
)


Actual result:
--------------
array(
  0 => "123  ",
  1 => "456 ",
)


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2011-07-29 01:44 UTC] lonnyk at gmail dot com
str_getcsv is a conversion tool and not an input validator. Any invalid input is 
surely not going to work correctly and I do not think it is good practice to 
account for bad input b/c it would cause extra processing time when it is not 
needed.

If the input were:
var_dump(str_getcsv('"123" a , "456" ', ',', '"'))

what would you expect str_getcsv to do?
 [2011-07-29 05:42 UTC] dmitry dot dulepov at gmail dot com
It is not about input validation :) Your example is clearly invalid input. The 
function should fail and return FALSE. My example perfectly fits into formal CSV 
grammar, thus it is valid input. It is just parsed incorrectly.

I would not send you a bug about invalid input :)
 [2015-05-19 13:27 UTC] cmb@php.net
-Status: Open +Status: Verified -Package: Unknown/Other Function +Package: Strings related
 [2015-05-19 13:27 UTC] cmb@php.net
The given test input is not valid according to RFC 4180;
whitespace is not allowed between 'COMMA' and 'escaped'. However,
the current behavior of str_getcsv() doesn't make sense to me, see
<http://3v4l.org/roJnp> and <http://3v4l.org/XcrLG>. Returning
FALSE or at least giving a warning or notice for such input seems
to be more appropriate.

By the way, fgetcsv() shows the same behavior.
 [2015-05-19 14:47 UTC] cmb@php.net
Interestingly, whitespaces in front of an escaped field are simply
ignored, whereas other characters trigger the field to be read as
if it was not escaped, see <http://3v4l.org/lGnOB>.
 [2015-05-19 14:47 UTC] cmb@php.net
-PHP Version: 5.3.6 +PHP Version: 5.6.9
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 21 11:01:30 2024 UTC