php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #70072 Add a language avare uppercase/lowercase solution without need of locale change
Submitted: 2015-07-14 13:14 UTC Modified: 2017-07-28 10:47 UTC
Votes:8
Avg. Score:4.4 ± 0.9
Reproduced:7 of 8 (87.5%)
Same Version:2 (28.6%)
Same OS:2 (28.6%)
From: ongun dot kanat at gmail dot com Assigned:
Status: Open Package: mbstring related
PHP Version: Irrelevant OS: any
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: ongun dot kanat at gmail dot com
New email:
PHP Version: OS:

 

 [2015-07-14 13:14 UTC] ongun dot kanat at gmail dot com
Description:
------------
Using many languages in one page(at the same time) is possible when your clients work globally/multilingual. And sometimes developers like me are struggling in implement a page uses/modifies strings from many languages at the same time. And sometimes it's just meaningless to change locale of whole page for just one string. It would be awesome a parameter to change language in mbstring upper/lowercase operations.

Example case
I am displaying a page in English but also displaying strings in languages below:
ß -> SS in German, ı->I in Turkish

Unicode already has mappings for chars and special cases.
http://unicode.org/faq/casemap_charprop.html#1

Test script:
---------------
<?php
mb_internal_encoding("UTF-8");
echo mb_strtoupper("Maß\n");  // "MAß"
echo mb_strtolower("MAẞ\n");  // "maß"

echo mb_strtoupper("ilaç\n"); // "ILAÇ"
echo mb_strtolower("IŞIK\n"); // "iŞik"
?>

Expected result:
----------------
MASS // Eszett
maß
İLAÇ // Turkish I issue
ışık

Actual result:
--------------
MAß
maß
ILAÇ
iŞik

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2017-07-28 10:47 UTC] nikic@php.net
In master the first two cases will now work correctly. The latter two do not, as these are language-specific conditional special case mappings, which are currently not implemented.
 [2024-05-22 13:00 UTC] turabgarip at gmail dot com
This is still relevant in PHP 8.3.7

Behavior is very unpredictable according to the position of the letters and according to the neighboring letters in a word. I mean if the resulting string was consistent somehow, we could accept what was commented by PHP team before; but since the resulting string is not predictable and not consistent, this, IMHO is a legit bug.

echo mb_convert_case('içecek', MB_CASE_LOWER); // prints normally
echo mb_convert_case('İçecek', MB_CASE_LOWER); // prints something over "ç" which looks like a cedilla but it's not. I can't paste it here because this form strips that character.

In fact it is a multibyte character attached to "i" and not to letter next to it.

echo mb_convert_case('ıi', MB_CASE_UPPER); // prints: II

This is already well-known.

echo mb_convert_case('İÇİŞLERİ', MB_CASE_LOWER); // prints that strange character everywhere distorting the whole word.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 17:01:32 2024 UTC