php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #68414 Yet another strtolower/upper problem in tr_TR
Submitted: 2014-11-13 14:02 UTC Modified: 2014-11-17 10:33 UTC
From: julien at palard dot fr Assigned:
Status: Wont fix Package: mbstring related
PHP Version: 5.5.18 OS: all
Private report: No CVE-ID: None
Have you experienced this issue?
Rate the importance of this bug to you:

 [2014-11-13 14:02 UTC] julien at palard dot fr
Description:
------------
Lots of bugs have beed opened on this subject, lots of bugs have beed closed / resolved / etc : https://bugs.php.net/search.php?search_for=turkish&boolean=0&limit=30&order_by=&direction=DESC&cmd=display&status=All&bug_type=All&project=All&php_os=&phpver=&cve_id=&assign=&author_email=&bug_age=0&bug_updated=0

Those bugs typically address iso8859-9, the latin-5 character encoding used in Turkey, this one addresses the UTF8 encoding:

In turkish:
SMALL I capitalized is CAPITAL I WITH DOT ABOVE (and vice-versa)
SMALL DOTLESS I capitalized CAPITAL I (and vice-versa)

In PHP I got (see test script)
Expects CAPITAL I WITH DOT ABOVE: iI
Expects SMALL I WITHOUT DOT: Ii

Which is, more or less, wrong for the 4 tests.

Test script:
---------------
<?php

mb_internal_encoding('UTF-8');
setlocale(LC_ALL, 'tr_TR.UTF-8');
echo "Expects CAPITAL I WITH DOT ABOVE: " . strtoupper('i') . mb_strtoupper('i', 'UTF-8') . PHP_EOL;
echo "Expects SMALL I WITHOUT DOT: " . strtolower('I') . mb_strtolower('I', 'UTF-8') . PHP_EOL;

Expected result:
----------------
Expects CAPITAL I WITH DOT ABOVE: İİ
Expects SMALL I WITHOUT DOT: ıı

Actual result:
--------------
Expects CAPITAL I WITH DOT ABOVE: iI
Expects SMALL I WITHOUT DOT: Ii

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2014-11-13 19:35 UTC] aharvey@php.net
-Status: Open +Status: Wont fix -Package: *Languages/Translation +Package: mbstring related
 [2014-11-13 19:35 UTC] aharvey@php.net
Ultimately, mbstring just isn't locale aware — although there's a hack in the code to handle ISO-8859-9 strings differently, I don't really think there's a sensible way to handle this in the general case given the limitations of mbstring's API.
 [2014-11-17 10:33 UTC] julien at palard dot fr
I understand that mb_* functions are not local aware, so let's stick to:

mb_strtoupper('i', 'UTF-8') -> I
mb_strtolower('I', 'UTF-8') -> i

And I understand that strtolower and strtoupper are not multibyte aware, so they just can't work with utf-8.

Maybe PHP needs a localized multibytes set of functions ?
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Apr 19 16:01:27 2024 UTC