php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #66727 fgetcsv handles default escape character ('\') inconsistently
Submitted: 2014-02-17 11:48 UTC Modified: 2018-02-15 17:15 UTC
Votes:2
Avg. Score:4.0 ± 1.0
Reproduced:0 of 0 (0.0%)
From: ikonta at yandex dot ru Assigned: cmb (profile)
Status: Duplicate Package: Filesystem function related
PHP Version: 5.5.9 OS: Gentoo GNU/Linux, 17.02.2014
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: ikonta at yandex dot ru
New email:
PHP Version: OS:

 

 [2014-02-17 11:48 UTC] ikonta at yandex dot ru
Description:
------------
Issue similiar with bug #66588 but not exact.
Also, #33847 describes almost exact issue. But it doesn't provide a solution.
Bug #46872 doesn't match actual documentation http://ru2.php.net/manual/en/function.fgetcsv.php

array fgetcsv ( resource $handle [, int $length = 0 [, string $delimiter = ',' [, string $enclosure = '"' [, string $escape = '\\' ]]]] )
…
 escape
    Set the escape character (one character only). Defaults as a backslash.


Last tested on amd64 build of =dev-lang/php-5.5.9:
Installed versions:  5.4.23(5.4)(11:45:23 04.02.2014)(apache2 berkdb bzip2 calendar cli crypt ctype fileinfo filter gd gdbm hash iconv json nls oci8-instant-client phar posix readline session simplexml ssl tokenizer unicode xml xmlreader xmlwriter xpm zip zlib -bcmath -cdb -cgi -cjk -curl -curlwrappers -debug -embed -enchant -exif -firebird -flatfile -fpm -frontbase -ftp -gmp -imap -inifile -intl -iodbc -ipv6 -kerberos -ldap -ldap-sasl -libedit -mhash -mssql -mysql -mysqli -mysqlnd -odbc -pcntl -pdo -postgres -qdbm -recode -selinux -sharedmem -snmp -soap -sockets -spell -sqlite -sybase-ct -sysvipc -threads -tidy -truetype -wddx -xmlrpc -xslt) 5.5.9(5.5)(13:59:23 17.02.2014)(apache2 berkdb bzip2 calendar cli crypt ctype fileinfo filter gd gdbm hash iconv json nls oci8-instant-client opcache phar posix readline session simplexml ssl tokenizer unicode xml xmlreader xmlwriter xpm zip zlib -bcmath -cdb -cgi -cjk -curl -debug -embed -enchant -exif -firebird -flatfile -fpm -frontbase -ftp -gmp -imap -inifile -intl -iodbc -ipv6 -kerberos -ldap -ldap-sasl -libedit -libmysqlclient -mhash -mssql -mysql -mysqli -odbc -pcntl -pdo -postgres -qdbm -recode -selinux -sharedmem -snmp -soap -sockets -spell -sqlite -sybase-ct -systemd -sysvipc -threads -tidy -truetype -wddx -xmlrpc -xslt)

Also planned a check with =dev-lang/php-5.4.25

Test files:

1-st step. Incorrect three filed CSV file:
test.CSV 
"05.10.13 12:30:06";"тестовая строка текста \";"12345"
Note: escape special char ('\') in the last data position if middle (2-nd) field.

2-nd step. Corrected CSV file (with adding missed escape character):
test.CSV 
"05.10.13 12:30:06";"тестовая строка текста \\";"12345"


I expect escape special char to be used on parsing string and skipped without translating to output.

Test script:
---------------
Test script, used to check it is:

<?php
  $file_ptr = fopen("/tmp/test.CSV", "r");

  while (($record = fgetcsv($file_ptr, 0, ";")) !== FALSE) {
    $record_fields = count($record);
    echo "В строке разобрано $record_fields полей\n";
    for ($i = 0; $i < $record_fields; $i++) {
      echo $record[$i];
      echo "\n";
    }
  }
?>

Expected result:
----------------
For 1-st file I expect to see warning/error message about incorrect/incomplete record in file.
With emergency exit or, depending on extra flags, continue parsing file after skipping bad record.

For 2-n file I expect to see three fields without extra-added escape char:
05.10.13 12:30:06
тестовая строка текста \
12345



Actual result:
--------------
1-st file is parsed into two fileds:
05.10.13 12:30:06
тестовая строка текста \";12345"
Start double-quote of last field (this time concatenated with 2-nd) is missed.
Escape special char ('\') is translated to output together with escaped special char ('"') to output.

2-nd file is parsed into three fileds as expected:
05.10.13 12:30:06
тестовая строка текста \\
12345

Note: escape special char ('\') is translated to output together with escaped one.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2016-08-26 13:36 UTC] cmb@php.net
Related to bug #67566.
 [2018-02-15 17:15 UTC] cmb@php.net
-Status: Open +Status: Duplicate -Type: Bug +Type: Documentation Problem -Assigned To: +Assigned To: cmb
 [2018-02-15 17:15 UTC] cmb@php.net
> Corrected CSV file (with adding missed escape character):

Actually, this is not supposed to fix the issue, since the $escape
character is *solely* treated as escape character immediately
before the $enclosure character.  This has just been fixed in the
docs (<http://svn.php.net/viewvc?view=revision&revision=344268>).

> Incorrect three filed CSV file:

Actually, this is a correct CSV file according to RFC 4180[1].  It
is not properly treated by fgetcsv(), though, unless you change
the $escape character to a character which is not contained in the
file (e.g. "\0").  There is already feature request #51496, which
would allow to set the $escape character to an empty string, to
avoid the doubtful escaping.

Therefore I'm closing this ticket as duplicate.

[1] <https://tools.ietf.org/html/rfc4180>
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 15:01:30 2024 UTC