php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #55763 str_getcsv incorrectly handles line-breaks inside fields
Submitted: 2011-09-22 16:41 UTC Modified: 2019-06-18 06:40 UTC
Votes:32
Avg. Score:4.2 ± 0.8
Reproduced:31 of 32 (96.9%)
Same Version:9 (29.0%)
Same OS:5 (16.1%)
From: talk at alexmingoia dot com Assigned: cmb (profile)
Status: Not a bug Package: Strings related
PHP Version: 5.3.8 OS: OS X 10.6
Private report: No CVE-ID: None
 [2011-09-22 16:41 UTC] talk at alexmingoia dot com
Description:
------------
RFC4180 states that fields can contain line breaks as long as they are properly enclosed by double-quotes.

str_getcsv treats line-breaks inside of enclosed fields as new records in the CSV.

Setting 'auto_detect_line_ending' to TRUE or using "\r\n" instead of "\n" still produces incorrect results.

Test script:
---------------
$csv = file_get_contents('test.csv');
$csvArray = str_getcsv($csv, "\n");
var_dump($csvArray);

Expected result:
----------------
array(4) {
  [0]=>
  string(15) "Name,Desc,Email"
  [1]=>
  string(4) "Alex"
  [2]=>
  string(18) "Is a PHP developer"
  [3]=>
  string(16) "alex@example.com"
}

Actual result:
--------------
array(4) {
  [0]=>
  string(15) "Name,Desc,Email"
  [1]=>
  string(14) "Alex,"Is a PHP"
  [2]=>
  string(9) "developer"
  [3]=>
  string(17) ",alex@example.com"
}

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2011-09-22 16:45 UTC] talk at alexmingoia dot com
Sorry... expected output should be

array(4) {
  [0]=>
  string(15) "Name,Desc,Email"
  [1]=>
  string(4) "Alex"
  [2]=>
  string(18) "Is a PHP
developer
"
  [3]=>
  string(16) "alex@example.com"
}
 [2012-04-27 03:11 UTC] darren at dcook dot org
The problem can also be shown with the example from the Wikipedia page (http://en.wikipedia.org/wiki/Comma-separated_values):

$s2=<<<EOD
Year,Make,Model,Description,Price
1997,Ford,E350,"ac, abs, moon",3000.00
1999,Chevy,"Venture ""Extended Edition""","",4900.00
1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00
1996,Jeep,Grand Cherokee,"MUST SELL!
air, moon roof, loaded",4799.00
EOD;

$lines=str_getcsv($s2,"\n");
print_r($lines);

It outputs:
Array
(
    [0] => Year,Make,Model,Description,Price
    [1] => 1997,Ford,E350,"ac, abs, moon",3000.00
    [2] => 1999,Chevy,"Venture ""Extended Edition""","",4900.00
    [3] => 1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00
    [4] => 1996,Jeep,Grand Cherokee,"MUST SELL!
    [5] => air, moon roof, loaded",4799.00
)

But it should output:
Array
(
    [0] => Year,Make,Model,Description,Price
    [1] => 1997,Ford,E350,"ac, abs, moon",3000.00
    [2] => 1999,Chevy,"Venture ""Extended Edition""","",4900.00
    [3] => 1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00
    [4] => 1996,Jeep,Grand Cherokee,"MUST SELL!
air, moon roof, loaded",4799.00
)
 [2013-08-22 03:09 UTC] alotacents at gmail dot com
to split the string in to record lines I used a regular expression that makes sure not to split inside of double quotes instead of using the str_getcsv. Then I used the str_getcsv on the line.

example

$s2=<<<EOD
Year,Make,Model,Description,Price
1997,Ford,E350,"ac, abs, moon",3000.00
1999,Chevy,"Venture ""Extended Edition""","",4900.00
1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00
1996,Jeep,Grand Cherokee,"MUST SELL!
air, moon roof, loaded",4799.00
EOD;

lines = preg_split('/[\r\n]{1,2}(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))/',$s2);

it outputs
Array (
 [0] => Year,Make,Model,Description,Price
 [1] => 1997,Ford,E350,"ac, abs, moon",3000.00
 [2] => 1999,Chevy,"Venture ""Extended Edition""","",4900.00
 [3] => 1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00
 [4] => 1996,Jeep,Grand Cherokee,"MUST SELL! air, moon roof, loaded",4799.00 
) 

to further convert 

$data = array();
foreach($lines as $row) {
 $data[] = str_getcsv($row);
}

print_r($data);

which will output

Array (
 [0] => Array (
   [0] => Year
   [1] => Make
   [2] => Model
   [3] => Description
   [4] => Price
 )
 [1] => Array (
   [0] => 1997
   [1] => Ford
   [2] => E350
   [3] => ac, abs, moon
   [4] => 3000.00
 )
 [2] => Array (
   [0] => 1999
   [1] => Chevy
   [2] => Venture "Extended Edition"
   [3] =>
   [4] => 4900.00
 )
 [3] => Array (
   [0] => 1999
   [1] => Chevy
   [2] => Venture "Extended Edition, Very Large"
   [3] => 
   [4] => 5000.00
 )
 [4] => Array (
   [0] => 1996
   [1] => Jeep
   [2] => Grand Cherokee
   [3] => MUST SELL! air, moon roof, loaded
   [4] => 4799.00
 )
)
 [2014-11-17 10:51 UTC] andrzejborkowski at gmail dot com
$csvTestStr = 'mg3150 manual,12,-61%,6,-54%,50%,8.0,1.0,
"canon powershot sx400 is 16mp, 30x zoom mini bridge camera w/ case & sd card",12,?,6,?,50%,50,1.0,
canon printer mg6450,12,20%,6,,50%,-10,1.0,
canon pixma 3150 wireless setup,12,-48%,6,-33%,50%,10,1.0,
mg3150 manual,12,-61%,6,-54%,50%,8.0,1.0,
"canon powershot sx400 is 16mp, 30x zoom mini bridge camera w/ case & sd card",12,?,6,?,50%,50,1.0,
canon printer mg6450,12,20%,6,,50%,-10,1.0,
canon pixma 3150 wireless setup,12,-48%,6,-33%,50%,10,1.0,';
        //$fields = str_getcsv($row,"/n"); #fail
        $rows = preg_split('/[\r\n]{1,2}(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))/', $csvTestStr); #split rows properly
        $this->assertTrue(count($rows) === 8);
        foreach ($rows as $row) {
            $fields = str_getcsv($row,',');
            $this->assertTrue(count($fields) === 9);
            if (count($fields) !== 9) {
                debug($fields);
            }
        }
 [2015-05-18 14:42 UTC] cmb@php.net
-Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb
 [2015-05-18 14:42 UTC] cmb@php.net
str_getcsv() is designed to parse a single CSV record into fields (what works as expected, see <http://3v4l.org/f1DXO>).

If \n as given as delimiter, it splits the string at the line endings, and also heeds the enclosing parameter, but only if that characters encloses the complete field (in this case one or more lines). This also works as expected, see <http://3v4l.org/UfsLi>.
 [2019-06-17 23:56 UTC] php at myingling dot com
This *IS* a bug.  Here is an example of valid quotation marks and new lines which illustrates the real problem:

https://3v4l.org/HAJh1
 [2019-06-18 06:40 UTC] cmb@php.net
Again, str_getcsv() is supposed to parse a *single* CSV record
(aka. line), exactly like fgetcsv() does.  How to split CSV
contents into records is left to the programmer.
<https://3v4l.org/D8Mqt> uses concrete knowledge of the input data
to correctly do that.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Wed Sep 11 13:01:28 2024 UTC