php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #66727 fgetcsv handles default escape character ('\') inconsistently
Submitted: 2014-02-17 11:48 UTC Modified: 2018-02-15 17:15 UTC
Votes:2
Avg. Score:4.0 ± 1.0
Reproduced:0 of 0 (0.0%)
From: ikonta at yandex dot ru Assigned: cmb (profile)
Status: Duplicate Package: Filesystem function related
PHP Version: 5.5.9 OS: Gentoo GNU/Linux, 17.02.2014
Private report: No CVE-ID: None
 [2014-02-17 11:48 UTC] ikonta at yandex dot ru
Description:
------------
Issue similiar with bug #66588 but not exact.
Also, #33847 describes almost exact issue. But it doesn't provide a solution.
Bug #46872 doesn't match actual documentation http://ru2.php.net/manual/en/function.fgetcsv.php

array fgetcsv ( resource $handle [, int $length = 0 [, string $delimiter = ',' [, string $enclosure = '"' [, string $escape = '\\' ]]]] )
…
 escape
    Set the escape character (one character only). Defaults as a backslash.


Last tested on amd64 build of =dev-lang/php-5.5.9:
Installed versions:  5.4.23(5.4)(11:45:23 04.02.2014)(apache2 berkdb bzip2 calendar cli crypt ctype fileinfo filter gd gdbm hash iconv json nls oci8-instant-client phar posix readline session simplexml ssl tokenizer unicode xml xmlreader xmlwriter xpm zip zlib -bcmath -cdb -cgi -cjk -curl -curlwrappers -debug -embed -enchant -exif -firebird -flatfile -fpm -frontbase -ftp -gmp -imap -inifile -intl -iodbc -ipv6 -kerberos -ldap -ldap-sasl -libedit -mhash -mssql -mysql -mysqli -mysqlnd -odbc -pcntl -pdo -postgres -qdbm -recode -selinux -sharedmem -snmp -soap -sockets -spell -sqlite -sybase-ct -sysvipc -threads -tidy -truetype -wddx -xmlrpc -xslt) 5.5.9(5.5)(13:59:23 17.02.2014)(apache2 berkdb bzip2 calendar cli crypt ctype fileinfo filter gd gdbm hash iconv json nls oci8-instant-client opcache phar posix readline session simplexml ssl tokenizer unicode xml xmlreader xmlwriter xpm zip zlib -bcmath -cdb -cgi -cjk -curl -debug -embed -enchant -exif -firebird -flatfile -fpm -frontbase -ftp -gmp -imap -inifile -intl -iodbc -ipv6 -kerberos -ldap -ldap-sasl -libedit -libmysqlclient -mhash -mssql -mysql -mysqli -odbc -pcntl -pdo -postgres -qdbm -recode -selinux -sharedmem -snmp -soap -sockets -spell -sqlite -sybase-ct -systemd -sysvipc -threads -tidy -truetype -wddx -xmlrpc -xslt)

Also planned a check with =dev-lang/php-5.4.25

Test files:

1-st step. Incorrect three filed CSV file:
test.CSV 
"05.10.13 12:30:06";"тестовая строка текста \";"12345"
Note: escape special char ('\') in the last data position if middle (2-nd) field.

2-nd step. Corrected CSV file (with adding missed escape character):
test.CSV 
"05.10.13 12:30:06";"тестовая строка текста \\";"12345"


I expect escape special char to be used on parsing string and skipped without translating to output.

Test script:
---------------
Test script, used to check it is:

<?php
  $file_ptr = fopen("/tmp/test.CSV", "r");

  while (($record = fgetcsv($file_ptr, 0, ";")) !== FALSE) {
    $record_fields = count($record);
    echo "В строке разобрано $record_fields полей\n";
    for ($i = 0; $i < $record_fields; $i++) {
      echo $record[$i];
      echo "\n";
    }
  }
?>

Expected result:
----------------
For 1-st file I expect to see warning/error message about incorrect/incomplete record in file.
With emergency exit or, depending on extra flags, continue parsing file after skipping bad record.

For 2-n file I expect to see three fields without extra-added escape char:
05.10.13 12:30:06
тестовая строка текста \
12345



Actual result:
--------------
1-st file is parsed into two fileds:
05.10.13 12:30:06
тестовая строка текста \";12345"
Start double-quote of last field (this time concatenated with 2-nd) is missed.
Escape special char ('\') is translated to output together with escaped special char ('"') to output.

2-nd file is parsed into three fileds as expected:
05.10.13 12:30:06
тестовая строка текста \\
12345

Note: escape special char ('\') is translated to output together with escaped one.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2016-08-26 13:36 UTC] cmb@php.net
Related to bug #67566.
 [2018-02-15 17:15 UTC] cmb@php.net
-Status: Open +Status: Duplicate -Type: Bug +Type: Documentation Problem -Assigned To: +Assigned To: cmb
 [2018-02-15 17:15 UTC] cmb@php.net
> Corrected CSV file (with adding missed escape character):

Actually, this is not supposed to fix the issue, since the $escape
character is *solely* treated as escape character immediately
before the $enclosure character.  This has just been fixed in the
docs (<http://svn.php.net/viewvc?view=revision&revision=344268>).

> Incorrect three filed CSV file:

Actually, this is a correct CSV file according to RFC 4180[1].  It
is not properly treated by fgetcsv(), though, unless you change
the $escape character to a character which is not contained in the
file (e.g. "\0").  There is already feature request #51496, which
would allow to set the $escape character to an empty string, to
avoid the doubtful escaping.

Therefore I'm closing this ticket as duplicate.

[1] <https://tools.ietf.org/html/rfc4180>
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 11:01:29 2024 UTC