php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #50686 fputcsv and fgetcsv doesn't follow CSV format
Submitted: 2010-01-07 16:05 UTC Modified: 2010-01-11 14:23 UTC
Votes:51
Avg. Score:4.6 ± 0.8
Reproduced:43 of 43 (100.0%)
Same Version:36 (83.7%)
Same OS:32 (74.4%)
From: thuejk at gmail dot com Assigned: iliaa
Status: Wont fix Package: Filesystem function related
PHP Version: 5.*, 6 OS: *
Private report: No CVE-ID:
Have you experienced this issue?
Rate the importance of this bug to you:

 [2010-01-07 16:05 UTC] thuejk at gmail dot com
Description:
------------
According to http://en.wikipedia.org/wiki/Comma-separated_values (which I assume is quoting RFC 4180), CSV does not have an escape character, but instead quotes fields with ", and escapes '"' with '""'.

But fputcsv() escapes '"' with '\"', and fgetcsv() incorrectly chokes on the fragment '\""' (which should be unescaped to '\"').

This is a problem for for example StarOffice, which actually implements the standard, and (understandably) chokes on fputcsv() output like
"\"",2,3

I haven't tested if this is also a problem in MS office, but it is if MS Office implements the CSV standard correctly.

Note that the fputcsv() manual at http://dk.php.net/manual/en/function.fputcsv.php says: "Format line as CSV", and similarly with fgetcsv(), so there is no doubt that the PHP functions should follow the RFC.

This bug seems to be the result of bug report http://bugs.php.net/bug.php?id=22382 , which is obviously bogus, but was "fixed" by breaking php's CSV support.

Reproduce code:
---------------
<?php
echo "<pre>fputcsv:\n";

$matrix = Array(Array('a\\"b',2,3));
$temp_handle = tmpfile();
foreach ($matrix as $row) {
  fputcsv($temp_handle, $row);
}
fseek($temp_handle, 0) === 0;
$str = fread($temp_handle, 1024*1024*100);
echo $str;
fclose($temp_handle); // this removes the tmpfile() file

echo "\nfgetcsv:\n";
$temp_handle = tmpfile();
fwrite($temp_handle, '"a\\""", 2, 3');
fseek($temp_handle, 0);
$a = fgetcsv($temp_handle);
echo $a[0];
fclose($temp_handle); // this removes the tmpfile() file
?>


Expected result:
----------------
fputcsv:
"a\""b",2,3

fgetcsv:
a\"


Actual result:
--------------
fputcsv:
"a\"b",2,3

fgetcsv:


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2010-01-07 18:42 UTC] jani@php.net
Ilia, maybe you want to check this out? :)
 [2010-01-08 13:05 UTC] thuejk at gmail dot com
Also, according to RFC 4180, CSV lines should be terminated by \r\n, not \n.

Since CSV is a data exchange format, the line termination should not be dependent on the system generating the CSV file.

fputcsv() currently terminates CSV-lines with \n on my Linux server.

----

PS: I just noticed that RFC 4180 is from 2005, so you have some excuse, since the PHP function predates the RFC. However, the RFC cites 4 previous non-RFC definitions, which all agree with the RFC on the important points, so it is not clear that the PHP implementation ever had any right to claim it had anything to do with the CSV format. And of course at least OpenOffice's CSV-support, and probably many other programs, can't work with the input/output of the PHP functions.
 [2010-01-09 15:48 UTC] thuejk at gmail dot com
I have written a test suite (for my own local replacement functions for fcsvget/put). So if you reimplement the PHP functions then you might want to send me your new implementation, so I can run it through the test suite.
 [2010-01-09 18:46 UTC] jani@php.net
No, you send us the tests provided as PHPTs like any other test is in our test suite. Not the other way around.
 [2010-01-10 19:21 UTC] thuejk at gmail dot com
I will consider doing the work to port the tests if you will commit to actually use them.

I assume you can't change the functions (until PHP 6 at least), since people may depend on their current behavior. So I assume you will just add a "THIS IS NOT REALLY CSV FORMAT" warning to the manual page, and perhaps add a strict warning when using the functions.

So I assume that any CSV functions will be new functions, which you will have to commit to making first.

So say you will make the CSV functions (and please also make string versions of both functions this time), then I will consider doing the effort of porting my tests :).
 [2010-01-11 02:55 UTC] iliaa@php.net
The PHP's behaviour on this matter was established before the spec 
written and the implementation of fgetcsv() will continue to support \ 
escape character. 

The fputcsv() function already uses "" for escaping.
 [2010-01-11 14:23 UTC] thuejk at gmail dot com
The spec existed before PHP's function was written, just not in RFC form, but in actual program implementations. For example I just tested Excel from Office 2000, and it also can't recognize CSV output from fputcsv (it chokes on the same string from my first post as OpenOffice).

IMO, just because there were no formal RFC before 2005 does not mean you can choose whatever format rules you like, and still call it CSV. Do you have any examples of CSV implementations which actually use the same format as fgetcsv()? If all spreadsheets disagree (Excel and OpenOffice tested), then you are probably doing something wrong.

In any case, I can understand if you want to keep the current fgetcsv for backwards-compatibility. But you really need to prominently display a warning on its manual page that
   "fgetcsv and fputcsv are incompatible with RFC 4180, Microsoft Excel, OpenOffice Calc, and probably most other CSV implementations."
since the PHP function (misleadingly IMO) contains CSV in its name.

Also, as I mentioned in the first post here, fputcsv does not double quotes for the input string (PHP format) '\\"' - the output of that is (PHP format) '\\"'.
 [2010-04-30 22:35 UTC] rememb70 at yahoo dot com
Thanks for all your hard work PHP!

Forgive the ego of thuejk and provide replacement functions.  It was very disappointing to fully implement code utilizing these and then discover that they are incompatible with actual csv files.

Thanks.
 [2010-05-30 12:38 UTC] david at davidhoulder dot com
I've written a CSV parser in PHP that could easily be converted to C to form the basis of a fix for this.

Go to
http://www.ubercart.org/project/uc_stock_update-2x
and grab the latest uc_stock_update-*.tar.gz

In that tarball you'll find uc_stock_update/ReadCSV.inc which uses a finite state machine to parse a CSV file. I think it handles RFC4180-compliant files correctly, although it currently tolerates many error cases. There's a little test suite in uc_stock_update/tests/ . Feedback welcome (see the download page).
 
PHP Copyright © 2001-2014 The PHP Group
All rights reserved.
Last updated: Thu Apr 24 02:02:10 2014 UTC