php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #72330 CSV fields incorrectly split if escape char followed by UTF chars
Submitted: 2016-06-03 16:00 UTC Modified: 2016-07-21 16:13 UTC
From: cronfy at gmail dot com Assigned: cmb (profile)
Status: Closed Package: Strings related
PHP Version: Irrelevant OS: Linux Mint 17.1 Rebecca
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: cronfy at gmail dot com
New email:
PHP Version: OS:

 

 [2016-06-03 16:00 UTC] cronfy at gmail dot com
Description:
------------
When escape character set for str_getcsv() is followed by some UTF characters, string is parsed incorrectly.

I tested it on php 5.4, 5.5, 5.6 and 7.0 - behavior is the same.

Test script:
---------------
$utf_1 = chr(0xD1) . chr(0x81); // U+0440;
$utf_2   = chr(0xD8) . chr(0x80); // U+0600

$string = '"first #' . $utf_1 . $utf_2 . '";"second one"';
$d = str_getcsv($string, ';', '"', "#");

print_r($d);



Expected result:
----------------
Array
(
    [0] => first #с؀
    [1] => second one
)


Actual result:
--------------
Array
(
    [0] => first #с؀";second one"
)


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2016-06-14 13:21 UTC] cmb@php.net
-Status: Open +Status: Feedback
 [2016-06-14 13:21 UTC] cmb@php.net
I can't reproduce this behavior, see <https://3v4l.org/QZJSC>.
Maybe it's locale dependend?
 [2016-06-15 00:08 UTC] cmb@php.net
-Assigned To: +Assigned To: cmb
 [2016-06-26 04:22 UTC] php-bugs at lists dot php dot net
No feedback was provided. The bug is being suspended because
we assume that you are no longer experiencing the problem.
If this is not the case and you are able to provide the
information that was requested earlier, please do so and
change the status of the bug back to "Re-Opened". Thank you.
 [2016-07-17 11:07 UTC] cronfy at gmail dot com
-Status: No Feedback +Status: Closed
 [2016-07-17 11:07 UTC] cronfy at gmail dot com
Yes, it it locale dependent. If I prepend script with 'setlocale(LC_ALL, "C")', everything works correctly. But if I don't (my default locale is ru_RU.UTF-8) or if I explicitly use 'setlocale(LC_ALL, "ru_RU")', result is incorrect.

I can't demonstrate it at 3v4l.org as it does not support required locales.
 [2016-07-17 11:08 UTC] cronfy at gmail dot com
I can't change this bug status to Reopened, because I do not have enough privileges.
 [2016-07-17 11:45 UTC] cmb@php.net
-Status: Closed +Status: Re-Opened
 [2016-07-17 11:45 UTC] cmb@php.net
Thanks, cronfy.

Might be related to bug #55507.
 [2016-07-21 16:13 UTC] cmb@php.net
-Summary: str_getcsv() splits fields incorrectly if escape char flollowed by UTF chars +Summary: CSV fields incorrectly split if escape char followed by UTF chars
 [2016-07-21 16:13 UTC] cmb@php.net
This issue does not only affect str_getcsv(), but also other CSV
reading functions, so I've changed the bug title.
 [2016-07-21 17:13 UTC] cmb@php.net
Automatic comment on behalf of cmb
Revision: http://git.php.net/?p=php-src.git;a=commit;h=f2c2a4be9e466f14677089efe33e20ca0b146809
Log: Fix #72330: CSV fields incorrectly split if escape char followed by UTF chars
 [2016-07-21 17:13 UTC] cmb@php.net
-Status: Re-Opened +Status: Closed
 [2016-10-17 10:10 UTC] bwoebi@php.net
Automatic comment on behalf of cmb
Revision: http://git.php.net/?p=php-src.git;a=commit;h=f2c2a4be9e466f14677089efe33e20ca0b146809
Log: Fix #72330: CSV fields incorrectly split if escape char followed by UTF chars
 
PHP Copyright © 2001-2017 The PHP Group
All rights reserved.
Last updated: Sun Nov 19 01:31:42 2017 UTC