|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #77375 mb_strtolower carries accents to next character
Submitted: 2018-12-30 16:21 UTC Modified: 2018-12-30 16:41 UTC
From: sjon at hortensius dot net Assigned:
Status: Open Package: mbstring related
PHP Version: 7.3.0 OS: archlinux
Private report: No CVE-ID: None
Have you experienced this issue?
Rate the importance of this bug to you:

 [2018-12-30 16:21 UTC] sjon at hortensius dot net
I found this while going through

it seems special characters are carried over to the next character when using mb_strtolower

Test script:
See and the scripts that it's based on

echo mb_strtolower("MOZAİK");

Expected result:

Actual result:


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2018-12-30 16:41 UTC]
We're following what Unicode specifies here. Excerpt from SpecialCasing:

# Preserve canonical equivalence for I with dot. Turkic is handled below.

0130; 0069 0307; 0130; 0130; # LATIN CAPITAL LETTER I WITH DOT ABOVE

0069 0307 is an "i" followed by "combining dot above". The fact that the dot is rendered over the next character seems like a bug, as combining marks generally apply to the previous character.

Not sure what to do here, I can definitely see how the current behavior is unwanted if you're working with Turkish text and I'm not sure why it's specified to behave the way it does.
PHP Copyright © 2001-2019 The PHP Group
All rights reserved.
Last updated: Wed Jan 16 12:01:25 2019 UTC