PHP :: Bug #42063 :: strtoupper() and locales: inconsistent behavior
php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #42063 strtoupper() and locales: inconsistent behavior
Submitted: 2007-07-21 20:27 UTC Modified: 2007-07-22 10:44 UTC
Votes:6
Avg. Score:5.0 ± 0.0
Reproduced:6 of 6 (100.0%)
Same Version:2 (33.3%)
Same OS:2 (33.3%)
From: jo at feuersee dot de Assigned:
Status: Wont fix Package: I18N and L10N related
PHP Version: 5.2.3 OS: Linux
Private report: No CVE-ID: None
Have you experienced this issue?
Rate the importance of this bug to you:

 [2007-07-21 20:27 UTC] jo at feuersee dot de
Description:
------------
I stumbled about this issue while debugging a project which uses PEAR XML_Parser which relys on the assumption that strtoupper() works for all US-ASCII characters no matter which locale or encoding.

Unfortunately, it does not.
This has been discussed in PHP bugs
22003
21771
35583

The letter i isn't converted to I by strtoupper() when the locale is set to turkish (via setlocale(LC_ALL, 'tr_TR'))
However, the letter i is converted to I by strtoupper() when the locale is set to turkish (via setlocale(LC_ALL, 'tr'))
mb_strtoupper() does work under all encodings.

Now, despite blaming the Unicode guys or the turkish language or whatever:

How are we supposed to code properly under these circumstances?

My proposal:
strtoupper/lower() claim to be locale aware, but they really aren't (try to upperchase umlaut ??? under locale de_DE - doesn't work at all). Thus, redefining these functions to work for 7-bit encoded (US-ASCII) data _only_ won't change anything. Skip the locale dependency, then the locale tr_TR will work.


Reproduce code:
---------------
jo@l33t ~> php -r 'setlocale(LC_ALL, "tr"); $text = "begin"; printf("%s %s\n", strtoupper($text), bin2hex(strtoupper($text))); '
BEGIN 424547494e
jo@l33t ~> php -r 'setlocale(LC_ALL, "tr_TR"); $text = "begin"; printf("%s %s\n", strtoupper($text), bin2hex(strtoupper($text))); '
BEG�N 424547dd4e
jo@l33t ~> php -r 'setlocale(LC_ALL, "tr_TR.UTF-8"); $text = "begin"; printf("%s %s\n", strtoupper($text), bin2hex(strtoupper($text))); '
BEGiN 424547694e


Expected result:
----------------
BEGIN 424547494e
(for all locales)

Actual result:
--------------
(see above)


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2007-07-22 10:44 UTC] johannes@php.net
PHP 6 will have support for this. Won't change with PHP 5.
 
PHP Copyright © 2001-2018 The PHP Group
All rights reserved.
Last updated: Fri Aug 17 16:01:26 2018 UTC