php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #55413 str_getcsv doesnt remove escape characters
Submitted: 2011-08-12 13:30 UTC Modified: 2017-07-24 23:39 UTC
Votes:47
Avg. Score:4.0 ± 1.2
Reproduced:41 of 44 (93.2%)
Same Version:13 (31.7%)
Same OS:10 (24.4%)
From: mathielen at gmail dot com Assigned:
Status: Open Package: Strings related
PHP Version: 5.3.6 OS: ubuntu 11.04
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: mathielen at gmail dot com
New email:
PHP Version: OS:

 

 [2011-08-12 13:30 UTC] mathielen at gmail dot com
Description:
------------
Escape-characters should only escape the next character if it is the delimiter-character. The Escape character itself should then be removed from the result.

Test script:
---------------
$line = '"A";"Some \"Stuff\"";"C"';
$token = str_getcsv($line, ';', '"', '\\');
print_r($token);

Expected result:
----------------
Array
(
    [0] => A
    [1] => Some "Stuff"
    [2] => C
)


Actual result:
--------------
Array
(
    [0] => A
    [1] => Some \"Stuff\"
    [2] => C
)

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2011-11-27 13:58 UTC] xoneca at gmail dot com
The bug can be reproduced with any escape character but quote char.

Test script:
---------------
$line = '"A";"Some ""Stuff""";"C"';
$tokens = str_getcsv( $line, ';', '"', '"' );
print_r( $tokens );

Actual and Expected Result:
---------------
Array
(
    [0] => A
    [1] => Some "Stuff"
    [2] => C
)
 [2012-04-27 03:08 UTC] darren at dcook dot org
Another way of looking at the code in comment 1 is that the behaviour is correct (for parsing Excel-style csv), but the documentation is confusing. In my testing the "" within quotes is being handled correctly (and the $escape parameter is either not being used, or has not got in my way yet).

But as another viewpoint, if we take the original bug report example and do:
  $line = '"A";"Some \"Stuff\"";"C"'
  print_r(str_getcsv($line, ';', '"', 'x'));

(BTW, I'm using 'x' to mean no escaping; using a '' uses the default instead!!)

Output is:

Array
(
    [0] => A
    [1] => Some \Stuff\""
    [2] => C
)

This almost makes sense if you consider it treated the second field as three sub-strings:
  "Some \"
  Stuff\
  ""

The problem is, if that was true, the 3rd sub-string got parsed wrongly. The 3rd sub-string should have evaluated to a blank string.

Summary: something is wrong. Either there is a bug to fix, or the $escape parameter should be removed completely, or the function needs to document the intended behaviour for corner cases like these.
 [2012-05-14 14:30 UTC] spidgorny at gmail dot com
5.3.10 is affected too. A bug in a primitive function like this after years of 
evolution should be embarrassing.
 [2012-07-14 07:39 UTC] dan dot libby at gmail dot com
I just ran into this bug also.

I don't know the history, and haven't reviewed the str_getcsv() source yet but I am guessing that *getcsv() were originally implemented with excel style double-quote escaping.  Somehow the escape='\\' param got added to the documentation, but seemingly not the code.

Defaulting escape='\\' as the documentation says would potentially break apps depending on escape='"'.  So that would be a breaking change, and a bad idea.

But leaving it as supporting only escape='"' is also bad, because it limits the utility of the function.  For example, I need to parse apache logs, and apache only supports escaping with \.   whoops.

So I believe the correct fix would be to default to escape='"' so we don't break apps using it with defaults, but still support explicit use of escape='\\'.

agree?  disagree?
 [2012-07-15 04:07 UTC] darren at dcook dot org
Yes, agree:
 1. change docs to say $escape defaults to '"'
 2. Change code to use $escape when it is something else

(NB. IIUC, this won't break backwards compatibility.)
 [2014-10-24 17:08 UTC] desertshadow at gmail dot com
Has this bug really been open for 3 years? This is a pretty big bug, 
1) The docs are incorrect
2) The CSV parser isn't working correctly

I'm trying to escape a comma in a CSV string but it doesn't appear to be escaping correctly.
 [2017-07-24 21:03 UTC] colinodell@php.net
According to RFC 4180 Common Format and MIME Type for CSV Files:

> 7.  If double-quotes are used to enclose fields, then a double-quote
>     appearing inside a field must be escaped by preceding it with
>     another double quote.  For example:
>
>     "aaa","b""bb","ccc"

PHP does indeed support this escaping method which is common amongst most other CSV implementations: https://3v4l.org/e0aX8
 [2017-07-24 21:15 UTC] colinodell@php.net
-Status: Open +Status: Wont fix -Type: Bug +Type: Feature/Change Request -Assigned To: +Assigned To: colinodell
 [2017-07-24 21:15 UTC] colinodell@php.net
Closing this because PHP does indeed have the ability to process escaped characters, but they must be escaped the CSV way.

(Processing backslash-escaped characters would therefore be a feature change which may impact backward compatibility.)
 [2017-07-24 21:45 UTC] dan dot libby at gmail dot com
imho, this bug should not have been closed without at least updating the documentation to match what the code actually does.

I just checked, and the online docs still say default="\\" without any comment that this is basically a no-op and doesn't escape anything.

In order to achieve escaping then, one must use/have excel-style escaping in the source document AND explicitly set escape='"'.

That seems totally broken and bizarre to me.

Further, one cannot always control the format of source document.  Consider if reading files from a third-party.

I would urge you to reconsider and either fix docs or fix the function to be more useful.
 [2017-07-24 23:39 UTC] colinodell@php.net
-Status: Wont fix +Status: Open -Assigned To: colinodell +Assigned To:
 [2017-07-24 23:39 UTC] colinodell@php.net
Thanks for bringing the documentation to my attention - I hadn't realized that was there.  Re-opening as requested.
 
PHP Copyright © 2001-2017 The PHP Group
All rights reserved.
Last updated: Sun Nov 19 01:31:42 2017 UTC