php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #42526 Broken classes and method names in Turkish locale
Submitted: 2007-09-03 06:32 UTC Modified: 2007-09-03 07:56 UTC
From: tokul at users dot sourceforge dot net Assigned:
Status: Not a bug Package: Scripting Engine problem
PHP Version: 5CVS-2007-09-03 (snap) OS: Linux Debian Etch
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: tokul at users dot sourceforge dot net
New email:
PHP Version: OS:

 

 [2007-09-03 06:32 UTC] tokul at users dot sourceforge dot net
Description:
------------
PHP zend_str_tolower() is locale insensitive on Windows and locale (LC_CTYPE) sensitive on other platforms. It causes case sensitivity problems in class and method names with Turkish locale on Linux, because Turkish locale has case sensitivity rules that differ from US English rules. Programmers make mistake by assuming that Latin small letter i is always equal to Latin capital letter i.

Issue can be fixed by writing tolower() function alternative that is not LC_CTYPE sensitive. My tests show that if zend_str_tolower() is modified to do locale insensitive case conversion, basic PHP install passes 'make test' without any additional issues and Zend/bench.php does not show performance decrease.

My code modifications are based on program licensed under GPL. locale insensitive tolower() takes only 5 lines in C. I can post link to patch, if you can use GPL code. If you can't do that, I can explain your programmers how locale insensitive tolower() works and you can write own function.

I know about 40086, 35583, 35050 and 18556 bug reports, but these bug reports are closed and it looks strange, if you don't fix the issue when interpreter works correctly on Windows and BSD and fix takes less than 10 lines.

Tested PHP 5.2-200709030430 and PHP6-200709030430 (unicode.semantics = off) snapshot. PHP6-200709030430 with unicode.semantics=on is not affected.

If you have other open bug report on this issue, please give its number.

Reproduce code:
---------------
class TestIt { }

class testclass2 {
    function TestItToo() {}
}

var_dump(setlocale(LC_ALL,'tr_TR.UTF-8'));
var_dump(class_exists('TestIt'));
$TestObj = new testclass2();
var_dump(method_exists($TestObj,'TestItToo'));

Expected result:
----------------
string(11) "tr_TR.UTF-8"
bool(true)
bool(true)

Actual result:
--------------
string(11) "tr_TR.UTF-8"
bool(false)
bool(false)

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2007-09-03 07:56 UTC] jani@php.net
Please do not submit the same bug more than once. An existing
bug report already describes this very problem. Even if you feel
that your issue is somewhat different, the resolution is likely
to be the same. 

Thank you for your interest in PHP.

Same as bug #35050 (read the last comment..)
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 16:01:29 2024 UTC