php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #68414 Yet another strtolower/upper problem in tr_TR
Submitted: 2014-11-13 14:02 UTC Modified: 2014-11-17 10:33 UTC
From: julien at palard dot fr Assigned:
Status: Wont fix Package: mbstring related
PHP Version: 5.5.18 OS: all
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: julien at palard dot fr
New email:
PHP Version: OS:

 

 [2014-11-13 14:02 UTC] julien at palard dot fr
Description:
------------
Lots of bugs have beed opened on this subject, lots of bugs have beed closed / resolved / etc : https://bugs.php.net/search.php?search_for=turkish&boolean=0&limit=30&order_by=&direction=DESC&cmd=display&status=All&bug_type=All&project=All&php_os=&phpver=&cve_id=&assign=&author_email=&bug_age=0&bug_updated=0

Those bugs typically address iso8859-9, the latin-5 character encoding used in Turkey, this one addresses the UTF8 encoding:

In turkish:
SMALL I capitalized is CAPITAL I WITH DOT ABOVE (and vice-versa)
SMALL DOTLESS I capitalized CAPITAL I (and vice-versa)

In PHP I got (see test script)
Expects CAPITAL I WITH DOT ABOVE: iI
Expects SMALL I WITHOUT DOT: Ii

Which is, more or less, wrong for the 4 tests.

Test script:
---------------
<?php

mb_internal_encoding('UTF-8');
setlocale(LC_ALL, 'tr_TR.UTF-8');
echo "Expects CAPITAL I WITH DOT ABOVE: " . strtoupper('i') . mb_strtoupper('i', 'UTF-8') . PHP_EOL;
echo "Expects SMALL I WITHOUT DOT: " . strtolower('I') . mb_strtolower('I', 'UTF-8') . PHP_EOL;

Expected result:
----------------
Expects CAPITAL I WITH DOT ABOVE: İİ
Expects SMALL I WITHOUT DOT: ıı

Actual result:
--------------
Expects CAPITAL I WITH DOT ABOVE: iI
Expects SMALL I WITHOUT DOT: Ii

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2014-11-13 19:35 UTC] aharvey@php.net
-Status: Open +Status: Wont fix -Package: *Languages/Translation +Package: mbstring related
 [2014-11-13 19:35 UTC] aharvey@php.net
Ultimately, mbstring just isn't locale aware — although there's a hack in the code to handle ISO-8859-9 strings differently, I don't really think there's a sensible way to handle this in the general case given the limitations of mbstring's API.
 [2014-11-17 10:33 UTC] julien at palard dot fr
I understand that mb_* functions are not local aware, so let's stick to:

mb_strtoupper('i', 'UTF-8') -> I
mb_strtolower('I', 'UTF-8') -> i

And I understand that strtolower and strtoupper are not multibyte aware, so they just can't work with utf-8.

Maybe PHP needs a localized multibytes set of functions ?
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Sun Jan 05 02:01:28 2025 UTC