php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #76696 Lower-case ß is capitalized as SS instead of ẞ
Submitted: 2018-08-02 16:26 UTC Modified: 2018-08-02 18:55 UTC
From: tim dot eulitz at gmail dot com Assigned:
Status: Not a bug Package: mbstring related
PHP Version: 7.3.0beta1 OS:
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: tim dot eulitz at gmail dot com
New email:
PHP Version: OS:

 

 [2018-08-02 16:26 UTC] tim dot eulitz at gmail dot com
Description:
------------
I have just read the changelog for PHP 7.3beta1 and noticed this example in the mb_strtoupper changes:

>Support for full case-mapping and case-folding has been added. Unlike simple case-mapping, full case-mapping may change the length of the string. For example:
>
>mb_strtoupper("Straße")
>
>// Produces STRAßE on PHP 7.2
>// Produces STRASSE on PHP 7.3"

The example provided is actually rather questionable and possibly wrong, since Capital ẞ has been added to standard German in 2017. So instead of either "STRAßE" or "STRASSE", the correct uppercase version should likely be "STRAẞE"

mb_strtolower('ẞ') correctly lower-cases the string to 'ß', so  this would be consistent behavior in my opinion and would make the behavior more future-proof.

Quick Wikipedia reference: https://en.wikipedia.org/wiki/Capital_%E1%BA%9E


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2018-08-02 17:48 UTC] cmb@php.net
It seems to me that Unicode is not up to date[1].

[1] <http://unicode.org/faq/casemap_charprop.html#11>
 [2018-08-02 18:55 UTC] nikic@php.net
-Status: Open +Status: Not a bug
 [2018-08-02 18:55 UTC] nikic@php.net
We're following the Unicode specification here, see in particular sections 3.13, 4.2 and 5.18 of the standard, as well as the SpecialCasing data for this particular rule. Just like everything else relating to Unicode, case mapping is hard, there are often multiple possible choices of what could be done, and it's ultimately necessary to choose one as the default behavior.

As a German myself, I'd say the decision of the Unicode consortium to capitalize "ß" to "SS" was and continues to be the correct one from the perspective of practical usage.

I also think that this is not something that Unicode will be able to change in the future (even if they wanted to), because such a change would make ß and ß a case pair, which they are currently not, which would violate the case pair stability policy.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Mar 19 10:01:30 2024 UTC