php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #72803 Failure to transliterate on Linux
Submitted: 2016-08-10 13:10 UTC Modified: 2018-08-25 13:17 UTC
From: rasmus at mindplay dot dk Assigned: cmb (profile)
Status: Not a bug Package: ICONV related
PHP Version: Irrelevant OS: Linux
Private report: No CVE-ID: None
 [2016-08-10 13:10 UTC] rasmus at mindplay dot dk
Description:
------------
I've come across one character that doesn't transliterate correctly with iconv() with the "TRANSLIT" option on Linux.

On Windows, it works - on every Linux build (PHP 5.3, 5.4, 5.5, 5.6, 7.0 and HHVM) it fails for this one character.

Unfortunately, this particular character is a normal character in Danish language, and the sites we build are in Danish.

The expected result posted below is that from a Windows build - the actual result is the erroneous result from a Linux build.

I know that libiconv is largely unmaintained, so maybe there is nothing to be done about this - a comment in the manual says to use mbstring and intl instead, but intl isn't a standard extension, and mbstring doesn't seem to have this feature?

I can work around this particular case, of course, by replacing ø and Ø myself first, but I find it problematic that this common operation isn't otherwise supported (or doesn't work) in PHP.


Test script:
---------------
<?php

// https://gist.github.com/mindplay-dk/9b2fa55ba5f08ab0b9308295f936cd41

$string = 'æøå'; // HEX: c3 a6 c3 b8 c3 a5

$clean = iconv('UTF-8', 'ASCII//TRANSLIT', $string);

var_dump($clean); // "aeoa" on Windows, "ae?a" on Linux???


Expected result:
----------------
string(4) "aeoa"

Actual result:
--------------
string(4) "ae?a"

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2016-08-10 17:07 UTC] cmb@php.net
Looks even worse for glibc-2.2.3/2: <https://3v4l.org/JNOga>.
Which version did you use?

Anyhow, I think that's not a PHP issue, but rather should be
solved upstream.
 [2016-08-15 08:01 UTC] rasmus at mindplay dot dk
To others looking for a solution, the only viable approach appears to be the "intl" extension - it's not enabled by default, but it does ship with the PHP binaries.

See here:

http://php.net/transliterator_transliterate
 [2018-08-25 13:17 UTC] cmb@php.net
-Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb
 [2018-08-25 13:17 UTC] cmb@php.net
Closing as not a bug, since it is obviously a bug or limitation of
glibc's iconv implementation.
 
PHP Copyright © 2001-2021 The PHP Group
All rights reserved.
Last updated: Wed Oct 27 11:03:34 2021 UTC