php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #81284 str_getcsv cant parse with multibyte delimiter
Submitted: 2021-07-23 13:01 UTC Modified: 2021-07-23 13:23 UTC
From: mx03 at volny dot cz Assigned:
Status: Closed Package: Strings related
PHP Version: Irrelevant OS:
Private report: No CVE-ID: None
 [2021-07-23 13:01 UTC] mx03 at volny dot cz
Description:
------------
Separator can be only one character that is not multibyte.

if setlocale is utf8, it cant parse the csv:
https://3v4l.org/MiD3B

if setlocale is not utf8, it parses the string, but the result has the 2 additional bytes:
https://3v4l.org/ohc4Q

Test script:
---------------
setlocale(LC_CTYPE, "en_US.UTF-8");

$test = "❤";

$line = "test1❤test2❤test3";
$header = str_getcsv($line, $test);
var_dump($header);

Expected result:
----------------
array(3) {
  [0]=>
  string(5) "test1"
  [1]=>
  string(7) "test2"
  [2]=>
  string(7) "test3"
}

Actual result:
--------------
array(1) {
  [0]=>
  string(21) "test1❤test2❤test3"
}

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2021-07-23 13:22 UTC] cmb@php.net
-Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb
 [2021-07-23 13:22 UTC] cmb@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php
 [2021-07-23 13:23 UTC] cmb@php.net
-Status: Not a bug +Status: Open -Type: Bug +Type: Documentation Problem -Assigned To: cmb +Assigned To:
 [2021-07-23 13:23 UTC] cmb@php.net
Well, documentation says "one character", what is misleading.
 [2021-07-23 13:40 UTC] mx03 at volny dot cz
If the delimiter is a utf8 string, one char is one char, even if it is multibyte.

And the note sounds like str_getcsv support utf8 strings are fully supported.
"The locale settings are taken into account by this function. If LC_CTYPE is e.g. en_US.UTF-8, strings in one-byte encodings may be read wrongly by this function."

But it is only supported in the $string parameter, and not in the $separator.

There should at least a note that describes this problem like with strlen only reads the amount of bytes.
 [2021-07-23 15:24 UTC] git@php.net
Automatic comment on behalf of cmb69
Revision: https://github.com/php/doc-en/commit/7d93e0fe5eabef4325e6fc33abc329d1d370be86
Log: Fix #81284: str_getcsv cant parse with multibyte delimiter
 [2021-07-23 15:24 UTC] git@php.net
-Status: Open +Status: Closed
 [2021-07-24 02:43 UTC] git@php.net
Automatic comment on behalf of mumumu
Revision: https://github.com/php/doc-ja/commit/b305c6ee2a5531b29aaf4f049463d72adbc7240e
Log: Fix #81284: str_getcsv cant parse with multibyte delimiter
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Apr 23 08:01:30 2024 UTC