|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
[2007-07-22 10:44 UTC] johannes@php.net
|
|||||||||||||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Sat Oct 25 20:00:01 2025 UTC |
Description: ------------ I stumbled about this issue while debugging a project which uses PEAR XML_Parser which relys on the assumption that strtoupper() works for all US-ASCII characters no matter which locale or encoding. Unfortunately, it does not. This has been discussed in PHP bugs 22003 21771 35583 The letter i isn't converted to I by strtoupper() when the locale is set to turkish (via setlocale(LC_ALL, 'tr_TR')) However, the letter i is converted to I by strtoupper() when the locale is set to turkish (via setlocale(LC_ALL, 'tr')) mb_strtoupper() does work under all encodings. Now, despite blaming the Unicode guys or the turkish language or whatever: How are we supposed to code properly under these circumstances? My proposal: strtoupper/lower() claim to be locale aware, but they really aren't (try to upperchase umlaut ??? under locale de_DE - doesn't work at all). Thus, redefining these functions to work for 7-bit encoded (US-ASCII) data _only_ won't change anything. Skip the locale dependency, then the locale tr_TR will work. Reproduce code: --------------- jo@l33t ~> php -r 'setlocale(LC_ALL, "tr"); $text = "begin"; printf("%s %s\n", strtoupper($text), bin2hex(strtoupper($text))); ' BEGIN 424547494e jo@l33t ~> php -r 'setlocale(LC_ALL, "tr_TR"); $text = "begin"; printf("%s %s\n", strtoupper($text), bin2hex(strtoupper($text))); ' BEG�N 424547dd4e jo@l33t ~> php -r 'setlocale(LC_ALL, "tr_TR.UTF-8"); $text = "begin"; printf("%s %s\n", strtoupper($text), bin2hex(strtoupper($text))); ' BEGiN 424547694e Expected result: ---------------- BEGIN 424547494e (for all locales) Actual result: -------------- (see above)