go to bug id or search bugs for
When escape character set for str_getcsv() is followed by some UTF characters, string is parsed incorrectly.
I tested it on php 5.4, 5.5, 5.6 and 7.0 - behavior is the same.
$utf_1 = chr(0xD1) . chr(0x81); // U+0440;
$utf_2 = chr(0xD8) . chr(0x80); // U+0600
$string = '"first #' . $utf_1 . $utf_2 . '";"second one"';
$d = str_getcsv($string, ';', '"', "#");
 => first #с
 => second one
 => first #с";second one"
Add a Patch
Add a Pull Request
I can't reproduce this behavior, see <https://3v4l.org/QZJSC>.
Maybe it's locale dependend?
No feedback was provided. The bug is being suspended because
we assume that you are no longer experiencing the problem.
If this is not the case and you are able to provide the
information that was requested earlier, please do so and
change the status of the bug back to "Re-Opened". Thank you.
Yes, it it locale dependent. If I prepend script with 'setlocale(LC_ALL, "C")', everything works correctly. But if I don't (my default locale is ru_RU.UTF-8) or if I explicitly use 'setlocale(LC_ALL, "ru_RU")', result is incorrect.
I can't demonstrate it at 3v4l.org as it does not support required locales.
I can't change this bug status to Reopened, because I do not have enough privileges.
Might be related to bug #55507.
This issue does not only affect str_getcsv(), but also other CSV
reading functions, so I've changed the bug title.
Automatic comment on behalf of cmb
Log: Fix #72330: CSV fields incorrectly split if escape char followed by UTF chars