php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #77375 mb_strtolower carries accents to next character
Submitted: 2018-12-30 16:21 UTC Modified: 2018-12-30 16:41 UTC
From: sjon at hortensius dot net Assigned:
Status: Open Package: mbstring related
PHP Version: 7.3.0 OS: archlinux
Private report: No CVE-ID: None
Have you experienced this issue?
Rate the importance of this bug to you:

 [2018-12-30 16:21 UTC] sjon at hortensius dot net
Description:
------------
I found this while going through https://3v4l.org/bughunt/7.3.0/7.2.13+7.2.12

it seems special characters are carried over to the next character when using mb_strtolower

Test script:
---------------
See https://3v4l.org/7pP2v and the scripts that it's based on

echo mb_strtolower("MOZAİK");

Expected result:
----------------
mozaik

Actual result:
--------------
mozai̇k

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2018-12-30 16:41 UTC] nikic@php.net
We're following what Unicode specifies here. Excerpt from SpecialCasing:

# Preserve canonical equivalence for I with dot. Turkic is handled below.

0130; 0069 0307; 0130; 0130; # LATIN CAPITAL LETTER I WITH DOT ABOVE

0069 0307 is an "i" followed by "combining dot above". The fact that the dot is rendered over the next character seems like a bug, as combining marks generally apply to the previous character.

Not sure what to do here, I can definitely see how the current behavior is unwanted if you're working with Turkish text and I'm not sure why it's specified to behave the way it does.
 [2019-01-30 14:27 UTC] goncalomm at protonmail dot com
You can fix this creating a simple function of turkish chars if you need to use this in  7.3 version.


<?php
$var = "MOZAİK";
function TurkishFix($inputText) {
    $search  = array('ç', 'Ç', 'ğ', 'Ğ', 'ı', 'İ', 'ö', 'Ö', 'ş', 'Ş', 'ü', 'Ü');
    $replace = array('c', 'C', 'g', 'G', 'i', 'I', 'o', 'O', 's', 'S', 'u', 'U');
    $outputText=str_replace($search, $replace, $inputText);
    return $outputText;
}
$string = TurkishFix($var);
echo mb_strtolower($string);


Check here : https://3v4l.org/sYr8m
 
PHP Copyright © 2001-2019 The PHP Group
All rights reserved.
Last updated: Thu May 23 15:01:32 2019 UTC