php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #49958 mb_strtoupper fails on Japanese characters
Submitted: 2009-10-22 16:14 UTC Modified: 2013-02-18 00:34 UTC
Votes:2
Avg. Score:2.5 ± 0.5
Reproduced:2 of 2 (100.0%)
Same Version:1 (50.0%)
Same OS:1 (50.0%)
From: mjong at magnafacta dot nl Assigned: moriyoshi (profile)
Status: No Feedback Package: mbstring related
PHP Version: 5.2.11 OS: Windows Vista
Private report: No CVE-ID: None
Have you experienced this issue?
Rate the importance of this bug to you:

 [2009-10-22 16:14 UTC] mjong at magnafacta dot nl
Description:
------------
Using strtoupper, ms_strtoupper and derived functions like ucfirst 
produced incorrectly encoded strings when used on strings containing 
Japanese Hirigana or Katakana characters.

As no uppercase versions of these characters exists they should be 
treated as e.g. numbers.

Workaround: use mb_check_encoding to revert to the old string when this 
happens.

Reproduce code:
---------------
// $s = strtoupper('まてえす and ジョング');
$s = strtoupper(
   base64_decode('44Gm44GI44GZ').
   ' and '.
   base64_decode('44K444On44Oz44Kw'));


if (mb_check_encoding($s)) {
  echo $s;
} else {
  echo 'Error';
}


Expected result:
----------------
まてえす and ジョング

Actual result:
--------------
Error

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-10-23 10:21 UTC] jani@php.net
mb_strtoupper() defaults to mb_internal_encoding(), so what does latter give you? If it's not the right encoding, then there is no bug here. And strtoupper() or ucfirst() or anything without mb_* in front of it aren't even supposed to work with such.. :)
 [2009-10-25 14:40 UTC] mjong at magnafacta dot nl
Ah... I assumed that mb_check_encoding() will use 
mb_internal_encoding() when none is specified.

Anyhow, I used UTF-8 as internal encoding, but I have just now tested 
it with all internal encodings (setting them using 
mb_internal_encoding()).
 [2009-10-25 14:41 UTC] mjong at magnafacta dot nl
[PLEASE ALLOW LONGER ENTRIES] 

Using strtoupper() all encodings create nonsense strings, but in half 
the cases the error can be tested using mb_check_encoding(). 
Strangely enough the 2-byte definitions including UTF-8 cannot be 
checked, while UTF-8 and the 4-byte encodings do OK.

Using mb_strtoupper() a third of the encodings do a proper 
translation. Half of the encodings generate nonsense and none of them 
can be tested using mb_check_encoding(). The rest of the encodings 
fail when using them with mb_strtoupper(). 
With mb_strtoupper() UTF-8 and UTF-16 work OK, but now UTF-32 fails. 
"Stranger and stranger", Alice said.
 [2009-10-26 08:46 UTC] jani@php.net
Please read this page and especially the last two examples: http://www.php.net/manual/en/mbstring.configuration.php 

Are you using the proper options?
 [2009-10-26 10:12 UTC] mjong at magnafacta dot nl
mbstring.func_overload = 0. Setting it to 2 or 7 just means there is 
no difference between calling sttrtoupper() and mb_strtoupper(), but 
ucfirst() still produces incorrectly coded strings. This is using UTF-
8 for internal encoding. Using UTF-16 and UTF-32 basically crashes the 
program. 

mbstring.strict_detection = Off. Setting it to On does have no effect 
whatsoever. I.e. if I check with phpinfo() it stays off, no matter 
what I specify as value. This is not the case with other php.ini 
settings. Besides there is no documentation on its effect.

I really researched this problem - but cannot not show you because 
then this form says I am spammming - but the crux of this problem is 
that UTF-8, UTF-16 and UTF-32 each behave differently when used with 
strtoupper() and mb_strtoupper() and all can result in string that are 
not valid encodings.

I have a working solution for my case, but I think this is a solvable 
bug.
 [2009-10-26 11:09 UTC] jani@php.net
Ever heard of http://www.pastebin.com/ ?
 [2009-11-03 01:00 UTC] php-bugs at lists dot php dot net
No feedback was provided for this bug for over a week, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
 [2010-04-25 20:48 UTC] felipe@php.net
-Status: No Feedback +Status: Feedback -Assigned To: +Assigned To: moriyoshi
 [2010-04-25 20:48 UTC] felipe@php.net
Please try using this snapshot:

  http://snaps.php.net/php5.2-latest.tar.gz
 
For Windows:

  http://windows.php.net/snapshots/


 [2011-12-05 10:52 UTC] magog dot the dot ogre at gmail dot com
Description:
-----------
I actually just encountered this problem as well with ucfirst() but not mb_strtoupper(). I would not even comment except that PHP 5.3.8 fails on using ucfirst() whereas 5.3.6 succeeds. As such, my code that had worked previously had stopped working before I put in a workaround using mb_strtoupper().

I do not know if PHP considers this to be a problem or not, since technically the software writer should not have used ucfirst() to begin with. However, it seems like a problem to me when an ostensibly minor update has the result of unnecessarily breaking code.

It did not work properly on two system I used, and did work properly on one system. Systems I tried:
*did NOT work - PHP 5.3.8 (cli) (built: Aug 23 2011 12:14:39)
*did NOT work - PHP 5.3.8 (cli) (built: Nov  8 2011 22:18:15) (custom build, IIRC)
*worked properly - PHP 5.3.6-13ubuntu3.2 with Suhosin-Patch (cli) (built: Oct 13 2011 23:09:42) 

Reproduce code:
---------------
echo ucfirst(urldecode("%E6%97%A5%E5%85%89%E6%9D%B1%E7%85%A7%E5%AE%AE%E9%99%BD%E6%98%8E%E9%96%80"));

Expected result: 
日光東照宮陽明門

Result given in failing versions above:
Ɨ�光東照宮陽明門
 [2013-02-18 00:34 UTC] php-bugs at lists dot php dot net
No feedback was provided. The bug is being suspended because
we assume that you are no longer experiencing the problem.
If this is not the case and you are able to provide the
information that was requested earlier, please do so and
change the status of the bug back to "Open". Thank you.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Apr 27 21:01:29 2024 UTC