php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #55413 str_getcsv doesnt remove escape characters
Submitted: 2011-08-12 13:30 UTC Modified: 2021-07-27 10:06 UTC
Votes:49
Avg. Score:4.1 ± 1.2
Reproduced:43 of 46 (93.5%)
Same Version:13 (30.2%)
Same OS:10 (23.3%)
From: mathielen at gmail dot com Assigned: cmb (profile)
Status: Closed Package: Strings related
PHP Version: 5.3.6 OS: ubuntu 11.04
Private report: No CVE-ID: None
 [2011-08-12 13:30 UTC] mathielen at gmail dot com
Description:
------------
Escape-characters should only escape the next character if it is the delimiter-character. The Escape character itself should then be removed from the result.

Test script:
---------------
$line = '"A";"Some \"Stuff\"";"C"';
$token = str_getcsv($line, ';', '"', '\\');
print_r($token);

Expected result:
----------------
Array
(
    [0] => A
    [1] => Some "Stuff"
    [2] => C
)


Actual result:
--------------
Array
(
    [0] => A
    [1] => Some \"Stuff\"
    [2] => C
)

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2011-11-27 13:58 UTC] xoneca at gmail dot com
The bug can be reproduced with any escape character but quote char.

Test script:
---------------
$line = '"A";"Some ""Stuff""";"C"';
$tokens = str_getcsv( $line, ';', '"', '"' );
print_r( $tokens );

Actual and Expected Result:
---------------
Array
(
    [0] => A
    [1] => Some "Stuff"
    [2] => C
)
 [2012-04-27 03:08 UTC] darren at dcook dot org
Another way of looking at the code in comment 1 is that the behaviour is correct (for parsing Excel-style csv), but the documentation is confusing. In my testing the "" within quotes is being handled correctly (and the $escape parameter is either not being used, or has not got in my way yet).

But as another viewpoint, if we take the original bug report example and do:
  $line = '"A";"Some \"Stuff\"";"C"'
  print_r(str_getcsv($line, ';', '"', 'x'));

(BTW, I'm using 'x' to mean no escaping; using a '' uses the default instead!!)

Output is:

Array
(
    [0] => A
    [1] => Some \Stuff\""
    [2] => C
)

This almost makes sense if you consider it treated the second field as three sub-strings:
  "Some \"
  Stuff\
  ""

The problem is, if that was true, the 3rd sub-string got parsed wrongly. The 3rd sub-string should have evaluated to a blank string.

Summary: something is wrong. Either there is a bug to fix, or the $escape parameter should be removed completely, or the function needs to document the intended behaviour for corner cases like these.
 [2012-05-14 14:30 UTC] spidgorny at gmail dot com
5.3.10 is affected too. A bug in a primitive function like this after years of 
evolution should be embarrassing.
 [2012-07-14 07:39 UTC] dan dot libby at gmail dot com
I just ran into this bug also.

I don't know the history, and haven't reviewed the str_getcsv() source yet but I am guessing that *getcsv() were originally implemented with excel style double-quote escaping.  Somehow the escape='\\' param got added to the documentation, but seemingly not the code.

Defaulting escape='\\' as the documentation says would potentially break apps depending on escape='"'.  So that would be a breaking change, and a bad idea.

But leaving it as supporting only escape='"' is also bad, because it limits the utility of the function.  For example, I need to parse apache logs, and apache only supports escaping with \.   whoops.

So I believe the correct fix would be to default to escape='"' so we don't break apps using it with defaults, but still support explicit use of escape='\\'.

agree?  disagree?
 [2012-07-15 04:07 UTC] darren at dcook dot org
Yes, agree:
 1. change docs to say $escape defaults to '"'
 2. Change code to use $escape when it is something else

(NB. IIUC, this won't break backwards compatibility.)
 [2014-10-24 17:08 UTC] desertshadow at gmail dot com
Has this bug really been open for 3 years? This is a pretty big bug, 
1) The docs are incorrect
2) The CSV parser isn't working correctly

I'm trying to escape a comma in a CSV string but it doesn't appear to be escaping correctly.
 [2017-07-24 21:03 UTC] colinodell@php.net
According to RFC 4180 Common Format and MIME Type for CSV Files:

> 7.  If double-quotes are used to enclose fields, then a double-quote
>     appearing inside a field must be escaped by preceding it with
>     another double quote.  For example:
>
>     "aaa","b""bb","ccc"

PHP does indeed support this escaping method which is common amongst most other CSV implementations: https://3v4l.org/e0aX8
 [2017-07-24 21:15 UTC] colinodell@php.net
-Status: Open +Status: Wont fix -Type: Bug +Type: Feature/Change Request -Assigned To: +Assigned To: colinodell
 [2017-07-24 21:15 UTC] colinodell@php.net
Closing this because PHP does indeed have the ability to process escaped characters, but they must be escaped the CSV way.

(Processing backslash-escaped characters would therefore be a feature change which may impact backward compatibility.)
 [2017-07-24 21:45 UTC] dan dot libby at gmail dot com
imho, this bug should not have been closed without at least updating the documentation to match what the code actually does.

I just checked, and the online docs still say default="\\" without any comment that this is basically a no-op and doesn't escape anything.

In order to achieve escaping then, one must use/have excel-style escaping in the source document AND explicitly set escape='"'.

That seems totally broken and bizarre to me.

Further, one cannot always control the format of source document.  Consider if reading files from a third-party.

I would urge you to reconsider and either fix docs or fix the function to be more useful.
 [2017-07-24 23:39 UTC] colinodell@php.net
-Status: Wont fix +Status: Open -Assigned To: colinodell +Assigned To:
 [2017-07-24 23:39 UTC] colinodell@php.net
Thanks for bringing the documentation to my attention - I hadn't realized that was there.  Re-opening as requested.
 [2021-07-27 10:06 UTC] cmb@php.net
-Status: Open +Status: Closed -Assigned To: +Assigned To: cmb
 [2021-07-27 10:06 UTC] cmb@php.net
The documentation has been updated a while ago[1].

Note that we can neither remove the escape parameter nor change
its default value.  As of PHP 7.4.0, it is possible to pass an
empty string as escape, which disables the proprietary escaping.

[1] <https://github.com/php/doc-en/commit/9421e06e34c689a3511a3fe889db295540bcf3a5>
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Nov 22 22:01:30 2024 UTC