go to bug id or search bugs for
The transliterator class does not work well when converting from Cyrillic Serbian to Latin Script Serbian. All the j letters in cyrillic are systematically converted to uppercase J in latin-script serbian while it should be lowercase j inside a word.
Online conversion tools probably also based on ICU don't have this bug and do the conversion correctly.
I am attaching a code sample that shows that bug. I tested that the bug exists in both PHP 5.4 and 5.5
$t = Transliterator::create('Serbian-Latin/BGN');
$source = 'Најгледанији сајтови';
. '<li>Cyrillic source: ' . $source . '</li>'
. '<li>Expected transliteration: Najgledaniji sajtovi</li>'
. '<li>Actual transliteration: ' . $t->transliterate($source) . '</li>'
This string :
Should be transliterated to:
But PHP transliterates it to:
Add a Patch
Add a Pull Request
Is your source cyrillic string UTF-8 encoded? No idea how to encode otherwise, but
with UTF-8 source it gives the translit you expect. So that might be the key.
All my sources are in utf8, I rechecked with the isutf8 bash command.
Ok, then it has to be ICU itself. I was testing on windows previously which has
ICU 50, but ubuntu 13.04 ships with ICU 48 and I can repro what you say there.
Which ICU version do you use? Most linux distros have 48 at the time. May be you
could try a newer ICU, even 51? But even now from what I can see it's unlikely a
"but with UTF-8 source it gives the translit you expect"
That's not the case for me, do you have an example online showing my example working? A gist on github for example.
>Ok, then it has to be ICU itself. I was testing on windows previously which has ICU 50, but ubuntu 13.04 ships with ICU 48 and I can repro what you say there.
> Which ICU version do you use? Most linux distros have 48 at the time. May be you could try a newer ICU, even 51? But even now from what I can see it's unlikely a PHP bug.
Phpinfo() indicates that the ICU version is 220.127.116.11, I confess I don't know how to upgrade it to a newer version to test.
I didn't say source, but "source cyrillic string UTF-8 encoded" ... well, that
might be nearly the same :)
I'm not going to expose my dev laptop on the net, anyway the snippet you've posted
is all i've tried anyway. Windows ICU50 works as you expect to be correct, ubuntu
ICU48 the erroneous behaviour you describe is reproduceable. So please try never
ICU, that could be it.
Sorry, but your problem does not imply a bug in PHP itself. For a
list of more appropriate places to ask for help using PHP, please
visit http://www.php.net/support.php as this bug system is not the
appropriate forum for asking support questions. Due to the volume
of reports we can not explain in detail here why your report is not
a bug. The support channels will be able to provide an explanation
Thank you for your interest in PHP.
I've just tried with ICU51 on Ubuntu and it works correct. So this is ICU48, and
I'd expect ICU50 to work, too.
You'll need to compile ICU yourself and then link PHP with it. Or maybe get the
PECL variant, build it with newer ICU and use with your regular PHP. That should
I compiled the ICU library with 51.2 source and indeed the bug is no longer there. Too bad Linux distros don't ship a newer version as it makes the transliteration feature a no go in practice. Thanks a lot for your time on this!