php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #77375 mb_strtolower carries accents to next character
Submitted: 2018-12-30 16:21 UTC Modified: 2018-12-30 16:41 UTC
From: sjon at hortensius dot net Assigned:
Status: Open Package: mbstring related
PHP Version: 7.3.0 OS: archlinux
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: sjon at hortensius dot net
New email:
PHP Version: OS:

 

 [2018-12-30 16:21 UTC] sjon at hortensius dot net
Description:
------------
I found this while going through https://3v4l.org/bughunt/7.3.0/7.2.13+7.2.12

it seems special characters are carried over to the next character when using mb_strtolower

Test script:
---------------
See https://3v4l.org/7pP2v and the scripts that it's based on

echo mb_strtolower("MOZAİK");

Expected result:
----------------
mozaik

Actual result:
--------------
mozai̇k

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2018-12-30 16:41 UTC] nikic@php.net
We're following what Unicode specifies here. Excerpt from SpecialCasing:

# Preserve canonical equivalence for I with dot. Turkic is handled below.

0130; 0069 0307; 0130; 0130; # LATIN CAPITAL LETTER I WITH DOT ABOVE

0069 0307 is an "i" followed by "combining dot above". The fact that the dot is rendered over the next character seems like a bug, as combining marks generally apply to the previous character.

Not sure what to do here, I can definitely see how the current behavior is unwanted if you're working with Turkish text and I'm not sure why it's specified to behave the way it does.
 
PHP Copyright © 2001-2019 The PHP Group
All rights reserved.
Last updated: Wed Jan 16 12:01:25 2019 UTC