php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #77375 mb_strtolower carries accents to next character
Submitted: 2018-12-30 16:21 UTC Modified: 2021-04-06 11:57 UTC
From: sjon at hortensius dot net Assigned: cmb (profile)
Status: Not a bug Package: mbstring related
PHP Version: 7.3.0 OS: archlinux
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: sjon at hortensius dot net
New email:
PHP Version: OS:

 

 [2018-12-30 16:21 UTC] sjon at hortensius dot net
Description:
------------
I found this while going through https://3v4l.org/bughunt/7.3.0/7.2.13+7.2.12

it seems special characters are carried over to the next character when using mb_strtolower

Test script:
---------------
See https://3v4l.org/7pP2v and the scripts that it's based on

echo mb_strtolower("MOZAİK");

Expected result:
----------------
mozaik

Actual result:
--------------
mozai̇k

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2018-12-30 16:41 UTC] nikic@php.net
We're following what Unicode specifies here. Excerpt from SpecialCasing:

# Preserve canonical equivalence for I with dot. Turkic is handled below.

0130; 0069 0307; 0130; 0130; # LATIN CAPITAL LETTER I WITH DOT ABOVE

0069 0307 is an "i" followed by "combining dot above". The fact that the dot is rendered over the next character seems like a bug, as combining marks generally apply to the previous character.

Not sure what to do here, I can definitely see how the current behavior is unwanted if you're working with Turkish text and I'm not sure why it's specified to behave the way it does.
 [2019-01-30 14:27 UTC] goncalomm at protonmail dot com
You can fix this creating a simple function of turkish chars if you need to use this in  7.3 version.


<?php
$var = "MOZAİK";
function TurkishFix($inputText) {
    $search  = array('ç', 'Ç', 'ğ', 'Ğ', 'ı', 'İ', 'ö', 'Ö', 'ş', 'Ş', 'ü', 'Ü');
    $replace = array('c', 'C', 'g', 'G', 'i', 'I', 'o', 'O', 's', 'S', 'u', 'U');
    $outputText=str_replace($search, $replace, $inputText);
    return $outputText;
}
$string = TurkishFix($var);
echo mb_strtolower($string);


Check here : https://3v4l.org/sYr8m
 [2021-04-06 11:57 UTC] cmb@php.net
-Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb
 [2021-04-06 11:57 UTC] cmb@php.net
> […] and I'm not sure why it's specified to behave the way it
> does.

I assume that is to not loose information (e.g. for a later
mb_strtoupper(): <https://3v4l.org/gA2aH>).

> The fact that the dot is rendered over the next character seems
> like a bug, […]

Indeed, but not a bug in ext/mbstring, but rather in Firefox (and
maybe other browsers/renderers).  It's displayed as expected in
Chrome.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 21 16:01:28 2024 UTC