php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #31074 Wrong behavior in fgetcsv if field contains newlines AND \ followed by "
Submitted: 2004-12-13 21:30 UTC Modified: 2005-01-21 00:34 UTC
From: b_ulrich at t-online dot de Assigned: iliaa (profile)
Status: Not a bug Package: Filesystem function related
PHP Version: 4.3.10 OS: Linux
Private report: No CVE-ID: None
 [2004-12-13 21:30 UTC] b_ulrich at t-online dot de
Description:
------------
If a csv field contains something like ...\"...\"... followed by a newline, fgetcsv assumes that the newline is the end of the datarow even it isn't the end of the field.

There's no problem if the field only contains text and newlines or only ...\"...\"...

If you create a Spreadsheet (2Rows with 3Columns) containing this:

<Row 1 Column 1>: R1C1
<Row 1 Column 2>: say \"YES\" to
element in next line
<Row 1 Column 3>: R1C3
<Row 2 Column 1>: R2C1
<Row 2 Column 2>: something with
multilines
<Row 2 Column 3>: R2C3

and export this to CSV, you will get:

R1C1;"say \""YES\"" to
element in next line";R1C3
R2C1;"something with
multilines";R2C3

This is correct Standard-CSV. But if you try to read it with fgetcsv you will get a Result containing 3 Rows, the first and the second containing 2 Elements, the third contains 3 Elements.

Reproduce code:
---------------
<?php

$csv_data="R1C1;\"say \\\"\"YES\\\"\" to\nelement in next line\";R1C3\nR2C1;\"something with\nmultilines\";R2C3\n";

$filename = "/tmp/test.csv";
$handle = fopen($filename, "wb");
fwrite($handle, $csv_data,strlen($csv_data));
fclose($handle);

$handle = fopen($filename, "rb");
while (($data = fgetcsv($handle,100000,";","\"")) !== FALSE) {
  $result[]=$data;
} // 
fclose($handle);

print_r($result);

?>


Expected result:
----------------
Array
(
    [0] => Array
        (
            [0] => R1C1
            [1] => say \"YES\" to
element in next line
            [2] => R1C3
        )

    [1] => Array
        (
            [0] => R2C1
            [1] => something with
multilines
            [2] => R2C3
        )

)


Actual result:
--------------
Array
(
    [0] => Array
        (
            [0] => R1C1
            [1] => say \"YES\" to
        )

    [1] => Array
        (
            [0] => element in next line"
            [1] => R1C3
        )

    [2] => Array
        (
            [0] => R2C1
            [1] => something with
multilines
            [2] => R2C3
        )

)


Note: There's no Problem, if there is no linebreak in Row1 Column2 and there is also no Problem if you don't have the backslashes in the line.


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2005-01-18 01:13 UTC] iliaa@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

In un-quoted strings a new line breaks the line.
 [2005-01-19 01:18 UTC] b_ulrich at t-online dot de
> In un-quoted strings a new line breaks the line.

That's right, but if you will have a look at the example code you will see that in the generated file the lines which contain the line breaks are qouted!

The content of the generated file looks as following:

R1C1;"say \""YES\"" to
element in next line";R1C3
R2C1;"something with
multilines";R2C3

So that is correct CSV, the doubleqoutes are escaped by a second doubleqoute an the whole field containing the linebreaks is encapsulated also in doubleqoutes.

So the result of fgetcsv() is wrong for this correct CSV-data.

Maybe you have been confused by the example in the description which should only show the User-view of the correct data.
 [2005-01-19 19:08 UTC] b_ulrich at t-online dot de
I tried http://snaps.php.net/php4-STABLE-latest.tar.gz (php4-STABLE-200501191730) 
but the result is still the same as in 4.3.10
 [2005-01-21 00:34 UTC] iliaa@php.net
What you have is an invalid csv (embeded quotes) as far as PHP is concerned.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Dec 26 12:01:30 2024 UTC