php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #81284 str_getcsv cant parse with multibyte delimiter
Submitted: 2021-07-23 13:01 UTC Modified: 2021-07-23 13:23 UTC
From: mx03 at volny dot cz Assigned:
Status: Closed Package: Strings related
PHP Version: Irrelevant OS:
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: mx03 at volny dot cz
New email:
PHP Version: OS:

 

 [2021-07-23 13:01 UTC] mx03 at volny dot cz
Description:
------------
Separator can be only one character that is not multibyte.

if setlocale is utf8, it cant parse the csv:
https://3v4l.org/MiD3B

if setlocale is not utf8, it parses the string, but the result has the 2 additional bytes:
https://3v4l.org/ohc4Q

Test script:
---------------
setlocale(LC_CTYPE, "en_US.UTF-8");

$test = "❤";

$line = "test1❤test2❤test3";
$header = str_getcsv($line, $test);
var_dump($header);

Expected result:
----------------
array(3) {
  [0]=>
  string(5) "test1"
  [1]=>
  string(7) "test2"
  [2]=>
  string(7) "test3"
}

Actual result:
--------------
array(1) {
  [0]=>
  string(21) "test1❤test2❤test3"
}

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2021-07-23 13:22 UTC] cmb@php.net
-Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb
 [2021-07-23 13:22 UTC] cmb@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php
 [2021-07-23 13:23 UTC] cmb@php.net
-Status: Not a bug +Status: Open -Type: Bug +Type: Documentation Problem -Assigned To: cmb +Assigned To:
 [2021-07-23 13:23 UTC] cmb@php.net
Well, documentation says "one character", what is misleading.
 [2021-07-23 13:40 UTC] mx03 at volny dot cz
If the delimiter is a utf8 string, one char is one char, even if it is multibyte.

And the note sounds like str_getcsv support utf8 strings are fully supported.
"The locale settings are taken into account by this function. If LC_CTYPE is e.g. en_US.UTF-8, strings in one-byte encodings may be read wrongly by this function."

But it is only supported in the $string parameter, and not in the $separator.

There should at least a note that describes this problem like with strlen only reads the amount of bytes.
 [2021-07-23 15:24 UTC] git@php.net
Automatic comment on behalf of cmb69
Revision: https://github.com/php/doc-en/commit/7d93e0fe5eabef4325e6fc33abc329d1d370be86
Log: Fix #81284: str_getcsv cant parse with multibyte delimiter
 [2021-07-23 15:24 UTC] git@php.net
-Status: Open +Status: Closed
 [2021-07-24 02:43 UTC] git@php.net
Automatic comment on behalf of mumumu
Revision: https://github.com/php/doc-ja/commit/b305c6ee2a5531b29aaf4f049463d72adbc7240e
Log: Fix #81284: str_getcsv cant parse with multibyte delimiter
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Dec 26 16:01:31 2024 UTC