|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
[2005-07-28 10:59 UTC] feldgendler at mail dot ru
Description:
------------
The source code in my testcase is in UTF-8 encoding itself. The quoted string contains Cyrillic letters. If I save the source code in KOI8-R (single-byte) Cyrillic encoding, and change the second argument to setlocale() to "ru_RU.KOI8-R", the observed result is what I expect. This shows that the bug only occurs on multi-byte characters, because in KOI8-R all characters are single-byte.
Relevant PHP configuration options:
--enable-mbstring=all
(--enable-zend-multibyte was not specified)
Relevant environment variables:
LANG=en_US.UTF-8
(LC_* are not set)
Reproduce code:
---------------
<?php
setlocale(LC_CTYPE, "en_US.UTF-8");
echo basename("english/???????");
?>
Expected result:
----------------
???????
Actual result:
--------------
english
PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
|
|||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Fri Oct 24 16:00:02 2025 UTC |
I've explored the source code of php_basename() function, and here is what I found: In case of a multi-byte character (inc_len > 1) that immediately follows a slash, state is not changed to 1 because that code is skipped. The following code: if (state == 0) { comp = c; state = 1; } ...needs to be inserted to the point marked below: while (cnt > 0) { inc_len = (*c == '\0' ? 1: php_mblen(c, cnt)); switch (inc_len) { case -2: case -1: inc_len = 1; php_mblen(NULL, 0); break; case 0: goto quit_loop; case 1: #if defined(PHP_WIN32) || defined(NETWARE) if (*c == '/' || *c == '\\') { #else if (*c == '/') { #endif if (state == 1) { state = 0; cend = c; } } else { if (state == 0) { comp = c; state = 1; } } default: -- HERE IT GOES --> break; } c += inc_len; cnt -= inc_len; } Can I expect that this bug will be fixed in CVS?