php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #79617 mb_convert_case() with MB_CASE_TITLE unexpected behavior for second characters
Submitted: 2020-05-21 17:41 UTC Modified: 2020-05-21 19:59 UTC
Votes:1
Avg. Score:2.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: dedienaar+phpnet at gmail dot com Assigned: nikic (profile)
Status: Not a bug Package: *Unicode Issues
PHP Version: 7.2.31 OS: Ubuntu 18.04.4 LTS
Private report: No CVE-ID: None
 [2020-05-21 17:41 UTC] dedienaar+phpnet at gmail dot com
Description:
------------
This relates to:
- https://www.php.net/manual/en/function.mb-convert-case
- https://github.com/php/php-src/blob/35e0a91db717fe441a89ca9554d8843d8ee63112/ext/mbstring/mbstring.c#L2674-L2709
- https://github.com/laravel/framework/issues/32910

The function `mb_convert_case()` unfortunately does not take apostrophes, quotes and certain other special characters into consideration when converting case using `MB_CASE_TITLE`.

Test script:
---------------
>>> var_dump(mb_convert_case('al-fātiḥah', MB_CASE_TITLE, 'UTF-8'));
=> string(13) "Al-Fātiḥah" // <-- GOOD
>>> var_dump(mb_convert_case('AL-FĀTIḤAH', MB_CASE_TITLE, 'UTF-8'));
=> string(13) "Al-Fātiḥah" // <-- GOOD

>>> var_dump(mb_convert_case('ʾāli-ʿimrān', MB_CASE_TITLE, 'UTF-8'));
=> string(15) "ʾāli-ʿimrān" // <-- NOT GOOD: Not uppercased 'Ā' and 'I' due to preceding ʾ and ʿ
>>> var_dump(mb_convert_case('ʾĀLI-ʿIMRĀN', MB_CASE_TITLE, 'UTF-8'));
=> string(15) "ʾāli-ʿimrān" // <-- NOT GOOD: Lowercased 'Ā' and 'I' due to preceding ʾ and ʿ

>>> var_dump(mb_convert_case('aṣ-ṣāffāt', MB_CASE_TITLE, 'UTF-8'));
=> string(15) "Aṣ-Ṣāffāt" // GOOD
>>> var_dump(mb_convert_case('AṢ-ṢĀFFĀT', MB_CASE_TITLE, 'UTF-8'));
=> string(15) "Aṣ-Ṣāffāt" // GOOD

>>> var_dump(mb_convert_case('ṭāʾ hāʾ', MB_CASE_TITLE, 'UTF-8'));
=> string(13) "Ṭāʾ Hāʾ" // GOOD
>>> var_dump(mb_convert_case('ṬĀʾ HĀʾ', MB_CASE_TITLE, 'UTF-8'));
=> string(13) "Ṭāʾ Hāʾ" // GOOD

>>> var_dump(mb_convert_case('ʾibrāhīm', MB_CASE_TITLE, 'UTF-8'));
=> string(11) "ʾibrāhīm" // <-- NOT GOOD: Lowercased 'I' due to preceding ʾ
>>> var_dump(mb_convert_case('ʾIBRĀHĪM', MB_CASE_TITLE, 'UTF-8'));
=> string(11) "ʾibrāhīm" // <-- NOT GOOD: Lowercased 'I' due to preceding ʾ
>>> var_dump(mb_convert_case('\'ibrāhīm', MB_CASE_TITLE, 'UTF-8'));
=> string(10) "'ibrāhīm" // <-- NOT GOOD: Lowercased 'I' due to preceding apostrophe
>>> var_dump(mb_convert_case('`ibrāhīm', MB_CASE_TITLE, 'UTF-8'));
=> string(10) "`ibrāhīm" // <-- NOT GOOD: Lowercased 'I' due to preceding backtick
>>> var_dump(mb_convert_case('"ibrāhīm', MB_CASE_TITLE, 'UTF-8'));
=> string(10) ""ibrāhīm" // <-- Wrongly lowercased 'I' due to preceding quote


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2020-05-21 18:09 UTC] girgias@php.net
-Assigned To: +Assigned To: nikic
 [2020-05-21 18:09 UTC] girgias@php.net
I'm assigning this to Nikita just to double check but this seems to me not be a bug.

Copy pasting the character which you are indicating to be a quotation mark (in which case it should indeed capitalize the first letter) into google I come up with the code point for "Modifier Letter Right Half Ring" (U+02BE) which is not a quotation mark and is meant to modify the previous code-point.

Thus the string is not valid UTF-8 from what my limited knowledge about it. And the behaviour is totally expected.

Moreover, a single apostrophe/quotation mark MUST NOT capitalize the letter after it otherwise we would get stupid text such as "Isn'T valid" which is and should be "Isn't"

The backtick follows the same principle you can't just have a single one and expect it to recognize it as a possible quotation mark (which is already highly debatable)
 [2020-05-21 19:52 UTC] nikic@php.net
-Status: Assigned +Status: Not a bug
 [2020-05-21 19:52 UTC] nikic@php.net
I rewrote title case folding in PHP 7.3, and as far as I can see everything folds correctly there: https://3v4l.org/8I587

Please use a recent version of PHP.
 [2020-05-21 19:59 UTC] nikic@php.net
For reference, the relevant fix in 7.3 was https://github.com/php/php-src/commit/f4a1d9c8211fa7878af14d0bd94b2deaab19ae21.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Wed Oct 09 20:01:27 2024 UTC