|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
[2007-02-15 20:11 UTC] mike at opendns dot com
Description: ------------ If an element in a CSV file ends with an odd number of trailing backslashes, it'll miss the enclosure character. This isn't a documentation problem - http://www.rfc-editor.org/rfc/rfc4180.txt Backslashes are not escape characters in CSV. This was part of bug #39538. The other half of that bug was correctly fixed. This is still broken. Reproduce code: --------------- mikef@dev5:/tmp# cat -A csv.tmp "this element contains the delimiter, and ends with an odd number of backslashes (ex: 1)\",and it isn't the last element$ mikef@dev5:/tmp# cat test_csv.php <?php $file = '/tmp/csv.tmp'; $h = fopen($file, 'r'); $data = fgetcsv($h); fclose($h); var_dump($data); ?> Expected result: ---------------- mikef@dev5:/tmp# php test_csv.php array(2) { [0]=> string(88) "this element contains the delimiter, and ends with an odd number of backslashes (ex: 1)\" [1]=> string(29) "and it isn't the last element" } Actual result: -------------- mikef@dev5:/tmp# php test_csv.php array(1) { [0]=> string(120) "this element contains the delimiter, and ends with an odd number of backslashes (ex: 1)\",and it isn't the last element" } PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
|
|||||||||||||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Wed Dec 03 16:00:01 2025 UTC |
Further thoughts on this... The delimiter logic needs to be replaced according to the double quote character rules (modified) in the above documentation (rfc4180.txt), i.e. as follows: ***************************** 7. If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. For example: "aaa","b""bb","ccc" ***************************** However, fgetcsv needs to replace the above rule for double quote with a test for the character supplied in the enclosure parameter. In other words, the delimiter character is always the enclosure character itself, but it can only delimit the same character. Please email me if help is needed with analysis or testing.