php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #50686 fputcsv and fgetcsv doesn't follow CSV format
Submitted: 2010-01-07 16:05 UTC Modified: 2010-01-11 14:23 UTC
Votes:94
Avg. Score:4.5 ± 0.8
Reproduced:81 of 83 (97.6%)
Same Version:49 (60.5%)
Same OS:52 (64.2%)
From: thuejk at gmail dot com Assigned: iliaa (profile)
Status: Wont fix Package: Filesystem function related
PHP Version: 5.*, 6 OS: *
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: thuejk at gmail dot com
New email:
PHP Version: OS:

 

 [2010-01-07 16:05 UTC] thuejk at gmail dot com
Description:
------------
According to http://en.wikipedia.org/wiki/Comma-separated_values (which I assume is quoting RFC 4180), CSV does not have an escape character, but instead quotes fields with ", and escapes '"' with '""'.

But fputcsv() escapes '"' with '\"', and fgetcsv() incorrectly chokes on the fragment '\""' (which should be unescaped to '\"').

This is a problem for for example StarOffice, which actually implements the standard, and (understandably) chokes on fputcsv() output like
"\"",2,3

I haven't tested if this is also a problem in MS office, but it is if MS Office implements the CSV standard correctly.

Note that the fputcsv() manual at http://dk.php.net/manual/en/function.fputcsv.php says: "Format line as CSV", and similarly with fgetcsv(), so there is no doubt that the PHP functions should follow the RFC.

This bug seems to be the result of bug report http://bugs.php.net/bug.php?id=22382 , which is obviously bogus, but was "fixed" by breaking php's CSV support.

Reproduce code:
---------------
<?php
echo "<pre>fputcsv:\n";

$matrix = Array(Array('a\\"b',2,3));
$temp_handle = tmpfile();
foreach ($matrix as $row) {
  fputcsv($temp_handle, $row);
}
fseek($temp_handle, 0) === 0;
$str = fread($temp_handle, 1024*1024*100);
echo $str;
fclose($temp_handle); // this removes the tmpfile() file

echo "\nfgetcsv:\n";
$temp_handle = tmpfile();
fwrite($temp_handle, '"a\\""", 2, 3');
fseek($temp_handle, 0);
$a = fgetcsv($temp_handle);
echo $a[0];
fclose($temp_handle); // this removes the tmpfile() file
?>


Expected result:
----------------
fputcsv:
"a\""b",2,3

fgetcsv:
a\"


Actual result:
--------------
fputcsv:
"a\"b",2,3

fgetcsv:


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2010-01-07 18:42 UTC] jani@php.net
Ilia, maybe you want to check this out? :)
 [2010-01-08 13:05 UTC] thuejk at gmail dot com
Also, according to RFC 4180, CSV lines should be terminated by \r\n, not \n.

Since CSV is a data exchange format, the line termination should not be dependent on the system generating the CSV file.

fputcsv() currently terminates CSV-lines with \n on my Linux server.

----

PS: I just noticed that RFC 4180 is from 2005, so you have some excuse, since the PHP function predates the RFC. However, the RFC cites 4 previous non-RFC definitions, which all agree with the RFC on the important points, so it is not clear that the PHP implementation ever had any right to claim it had anything to do with the CSV format. And of course at least OpenOffice's CSV-support, and probably many other programs, can't work with the input/output of the PHP functions.
 [2010-01-09 15:48 UTC] thuejk at gmail dot com
I have written a test suite (for my own local replacement functions for fcsvget/put). So if you reimplement the PHP functions then you might want to send me your new implementation, so I can run it through the test suite.
 [2010-01-09 18:46 UTC] jani@php.net
No, you send us the tests provided as PHPTs like any other test is in our test suite. Not the other way around.
 [2010-01-10 19:21 UTC] thuejk at gmail dot com
I will consider doing the work to port the tests if you will commit to actually use them.

I assume you can't change the functions (until PHP 6 at least), since people may depend on their current behavior. So I assume you will just add a "THIS IS NOT REALLY CSV FORMAT" warning to the manual page, and perhaps add a strict warning when using the functions.

So I assume that any CSV functions will be new functions, which you will have to commit to making first.

So say you will make the CSV functions (and please also make string versions of both functions this time), then I will consider doing the effort of porting my tests :).
 [2010-01-11 02:55 UTC] iliaa@php.net
The PHP's behaviour on this matter was established before the spec 
written and the implementation of fgetcsv() will continue to support \ 
escape character. 

The fputcsv() function already uses "" for escaping.
 [2010-01-11 14:23 UTC] thuejk at gmail dot com
The spec existed before PHP's function was written, just not in RFC form, but in actual program implementations. For example I just tested Excel from Office 2000, and it also can't recognize CSV output from fputcsv (it chokes on the same string from my first post as OpenOffice).

IMO, just because there were no formal RFC before 2005 does not mean you can choose whatever format rules you like, and still call it CSV. Do you have any examples of CSV implementations which actually use the same format as fgetcsv()? If all spreadsheets disagree (Excel and OpenOffice tested), then you are probably doing something wrong.

In any case, I can understand if you want to keep the current fgetcsv for backwards-compatibility. But you really need to prominently display a warning on its manual page that
   "fgetcsv and fputcsv are incompatible with RFC 4180, Microsoft Excel, OpenOffice Calc, and probably most other CSV implementations."
since the PHP function (misleadingly IMO) contains CSV in its name.

Also, as I mentioned in the first post here, fputcsv does not double quotes for the input string (PHP format) '\\"' - the output of that is (PHP format) '\\"'.
 [2010-04-30 22:35 UTC] rememb70 at yahoo dot com
Thanks for all your hard work PHP!

Forgive the ego of thuejk and provide replacement functions.  It was very disappointing to fully implement code utilizing these and then discover that they are incompatible with actual csv files.

Thanks.
 [2010-05-30 12:38 UTC] david at davidhoulder dot com
I've written a CSV parser in PHP that could easily be converted to C to form the basis of a fix for this.

Go to
http://www.ubercart.org/project/uc_stock_update-2x
and grab the latest uc_stock_update-*.tar.gz

In that tarball you'll find uc_stock_update/ReadCSV.inc which uses a finite state machine to parse a CSV file. I think it handles RFC4180-compliant files correctly, although it currently tolerates many error cases. There's a little test suite in uc_stock_update/tests/ . Feedback welcome (see the download page).
 [2016-01-05 14:08 UTC] minexew at gmail dot com
Is there any chance that this will be fixed?
As of today, the documentation for fgetcsv is still misleading and there is no simple workaround.
 [2016-09-23 17:17 UTC] office at williamhollandschool dot com
I see that the versions entry only lists 5.* and 6. I use PHP 7 and the "escape" character thing has caused me no end of grief in trying to implement import and export of natural language documents into and out of CSV format to later be interpreted by MariaDB's LOAD INFILE. Even trying the workaround that's posted everywhere of putting the "escape" character the same as the enclosure character only caused it to fail to escape any enclosure characters. Having to generate then fix giant files by hand is what has driven me to Python for this task. Many, many things are still using CSV. Not the most important bug in PHP 7, but one that has annoyed me to no end.
 [2017-02-09 17:03 UTC] aleksey dot spec at gmail dot com
Please, change the behaviour or provide another function. We have problems while preparing csv-files for ClickHouse.
When I write
$string = '\\"';
fputcsv($fileHandle, [$string]);
I expect to see this in my file:
"\"""
but currently I see 
"\""

Csv is often used for communications between different environments, not only inside php with fputcsv/fgetcsv. so it would be much better if generated csv file obeys some commonly used standard like RFC 4180.

P.S. Please, do not take personally. I really love php and I'd like it to become even more beautiful and popular.
 [2017-04-11 18:43 UTC] plenty dot su at gmail dot com
I mentioned in: https://bugs.php.net/bug.php?id=43225
A workaround so far is to disable escape method by: fputcsv($handle, $array, ',', '"', "\0")
 [2019-10-17 14:04 UTC] durinf at gmail dot com
Hey, if ya'll are too lazy to fix this, how about at least deprecating it?
Sheesh.
 [2019-10-17 14:11 UTC] bugreports at gmail dot com
what would you gain by deprecate it?

it only would bring users using it for their workloads for 15 years without any issue in trouble and you still won#t have something mathcing your needs
 [2022-07-13 07:24 UTC] myfordbenefits dot cc at gmail dot com
Facing same issue here. Help is appreciated. Thanks for the info i will try to figure it out for more. <a href="">myfordbenefits</a>
.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Oct 08 22:01:27 2024 UTC