php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #70072 Add a language avare uppercase/lowercase solution without need of locale change
Submitted: 2015-07-14 13:14 UTC Modified: 2017-07-28 10:47 UTC
Votes:8
Avg. Score:4.4 ± 0.9
Reproduced:7 of 8 (87.5%)
Same Version:2 (28.6%)
Same OS:2 (28.6%)
From: ongun dot kanat at gmail dot com Assigned:
Status: Open Package: mbstring related
PHP Version: Irrelevant OS: any
Private report: No CVE-ID: None
 [2015-07-14 13:14 UTC] ongun dot kanat at gmail dot com
Description:
------------
Using many languages in one page(at the same time) is possible when your clients work globally/multilingual. And sometimes developers like me are struggling in implement a page uses/modifies strings from many languages at the same time. And sometimes it's just meaningless to change locale of whole page for just one string. It would be awesome a parameter to change language in mbstring upper/lowercase operations.

Example case
I am displaying a page in English but also displaying strings in languages below:
ß -> SS in German, ı->I in Turkish

Unicode already has mappings for chars and special cases.
http://unicode.org/faq/casemap_charprop.html#1

Test script:
---------------
<?php
mb_internal_encoding("UTF-8");
echo mb_strtoupper("Maß\n");  // "MAß"
echo mb_strtolower("MAẞ\n");  // "maß"

echo mb_strtoupper("ilaç\n"); // "ILAÇ"
echo mb_strtolower("IŞIK\n"); // "iŞik"
?>

Expected result:
----------------
MASS // Eszett
maß
İLAÇ // Turkish I issue
ışık

Actual result:
--------------
MAß
maß
ILAÇ
iŞik

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2017-07-28 10:47 UTC] nikic@php.net
In master the first two cases will now work correctly. The latter two do not, as these are language-specific conditional special case mappings, which are currently not implemented.
 [2024-05-22 13:00 UTC] turabgarip at gmail dot com
This is still relevant in PHP 8.3.7

Behavior is very unpredictable according to the position of the letters and according to the neighboring letters in a word. I mean if the resulting string was consistent somehow, we could accept what was commented by PHP team before; but since the resulting string is not predictable and not consistent, this, IMHO is a legit bug.

echo mb_convert_case('içecek', MB_CASE_LOWER); // prints normally
echo mb_convert_case('İçecek', MB_CASE_LOWER); // prints something over "ç" which looks like a cedilla but it's not. I can't paste it here because this form strips that character.

In fact it is a multibyte character attached to "i" and not to letter next to it.

echo mb_convert_case('ıi', MB_CASE_UPPER); // prints: II

This is already well-known.

echo mb_convert_case('İÇİŞLERİ', MB_CASE_LOWER); // prints that strange character everywhere distorting the whole word.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 13:01:29 2024 UTC