php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #42063 strtoupper() and locales: inconsistent behavior
Submitted: 2007-07-21 20:27 UTC Modified: 2007-07-22 10:44 UTC
Votes:6
Avg. Score:5.0 ± 0.0
Reproduced:6 of 6 (100.0%)
Same Version:2 (33.3%)
Same OS:2 (33.3%)
From: jo at feuersee dot de Assigned:
Status: Wont fix Package: I18N and L10N related
PHP Version: 5.2.3 OS: Linux
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: jo at feuersee dot de
New email:
PHP Version: OS:

 

 [2007-07-21 20:27 UTC] jo at feuersee dot de
Description:
------------
I stumbled about this issue while debugging a project which uses PEAR XML_Parser which relys on the assumption that strtoupper() works for all US-ASCII characters no matter which locale or encoding.

Unfortunately, it does not.
This has been discussed in PHP bugs
22003
21771
35583

The letter i isn't converted to I by strtoupper() when the locale is set to turkish (via setlocale(LC_ALL, 'tr_TR'))
However, the letter i is converted to I by strtoupper() when the locale is set to turkish (via setlocale(LC_ALL, 'tr'))
mb_strtoupper() does work under all encodings.

Now, despite blaming the Unicode guys or the turkish language or whatever:

How are we supposed to code properly under these circumstances?

My proposal:
strtoupper/lower() claim to be locale aware, but they really aren't (try to upperchase umlaut ??? under locale de_DE - doesn't work at all). Thus, redefining these functions to work for 7-bit encoded (US-ASCII) data _only_ won't change anything. Skip the locale dependency, then the locale tr_TR will work.


Reproduce code:
---------------
jo@l33t ~> php -r 'setlocale(LC_ALL, "tr"); $text = "begin"; printf("%s %s\n", strtoupper($text), bin2hex(strtoupper($text))); '
BEGIN 424547494e
jo@l33t ~> php -r 'setlocale(LC_ALL, "tr_TR"); $text = "begin"; printf("%s %s\n", strtoupper($text), bin2hex(strtoupper($text))); '
BEG�N 424547dd4e
jo@l33t ~> php -r 'setlocale(LC_ALL, "tr_TR.UTF-8"); $text = "begin"; printf("%s %s\n", strtoupper($text), bin2hex(strtoupper($text))); '
BEGiN 424547694e


Expected result:
----------------
BEGIN 424547494e
(for all locales)

Actual result:
--------------
(see above)


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2007-07-22 10:44 UTC] johannes@php.net
PHP 6 will have support for this. Won't change with PHP 5.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sun Dec 22 01:01:30 2024 UTC