php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #29955 Support for Turkish/iso-8859-9
Submitted: 2004-09-02 17:15 UTC Modified: 2007-09-04 14:14 UTC
Votes:2
Avg. Score:4.5 ± 0.5
Reproduced:2 of 2 (100.0%)
Same Version:1 (50.0%)
Same OS:0 (0.0%)
From: jan at horde dot org Assigned: hirokawa (profile)
Status: Closed Package: mbstring related
PHP Version: 5CVS, 4CVS (2004-09-02) OS: Linux
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: jan at horde dot org
New email:
PHP Version: OS:

 

 [2004-09-02 17:15 UTC] jan at horde dot org
Description:
------------
In ISO-8859-9 (Turkish) the uppercase letter of "i" is a dotted uppercase "I", the lowercase letter of "I" is a dotless "i". But mb_strtolower() und mb_strtoupper() simply return the ASCII uppercase or lowercase counterparts.

You get the correct result with:
setlocale(LC_ALL, 'tr_TR');
echo strtoupper('i');
echo strtolower('I');

But the wrong results with:
echo mb_strtoupper('i', 'iso-8859-9');
echo mb_strtolower('I', 'iso-8859-9');



Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2005-02-22 11:10 UTC] moriyoshi@php.net
It turned out this is because mbstring doesn't take the 
locale into consideration.


 [2005-05-13 02:26 UTC] mustafa at deu dot edu dot tr
I get the same results like jan.

I need to get UTF-8 output for consuming a web service and I configured my php 5.0.4 with --enable-mbstring=all parameter (on linux that has been set with Turkish locale)

I see that mbstring extension has limited language support in source code. (German, English, Japanese, Korean, Russian, Chinese)

Is there a way to add our (Turkish) language to source code? Any references about this extension's source?
 [2005-05-13 08:00 UTC] moriyoshi@php.net
Turkish locale would need complete overhaul on the 
entire extension because the locale's character 
properties and required case folding behaviour are very 
special.

PHP-ICU extension could support anything, but that's 
just an ongoing work by l0t3k.

 [2005-12-23 14:10 UTC] hirokawa@php.net
I don't know which is the standard way (0x49 or 0xdd).
In ISO-8859-9 (Turkish),
upper case of 'i' (0x69) always should be translated to 'I' 
with dot (0xdd) ?
If yes, please let me know some URLs which describe 
the mapping.


 [2005-12-23 14:24 UTC] jan at horde dot org
See http://www.gymel.com/charsets/ISO8859-9.html#U0069 and http://www.gymel.com/charsets/ISO8859-9.html#U0049 under "Bemerkungen:" (remarks).
 [2005-12-23 14:28 UTC] derick@php.net
"man iso-8859-9" will tell you.

"i" maps to "0xdd"
and
"0xfd" maps to "I"

See also:
http://www.eki.ee/letter/chardata.cgi?lang=tr+Turkish&script=latin
 [2005-12-23 14:56 UTC] hirokawa@php.net
Please try using this CVS snapshot:

  http://snaps.php.net/php6.0-latest.tar.gz
 
For Windows:
 
  http://snaps.php.net/win32/php6.0-win32-latest.zip

Turkish language support is added in CVS HEAD.
When mbstring.language = Turkish,
Turkish case filding will be performed in ISO-8859-9.
(upper:0x69 -> 0xdd, lower:0x49->0xfd)
Otherwise, normal case folding is performed.
(upper:0x69 -> 0x49, lower:0x49->0x69)

 [2007-01-05 14:31 UTC] jan at horde dot org
Any chance this is going to be backported to PHP 5.2? I guess mbstring is going to be obsolete with the Unicode and ICU support in PHP 6.
 [2007-01-05 14:33 UTC] jan at horde dot org
Oh, and by the way, this conversion should always happen for iso-8859-9, not only if mbstring.language is set to Turkish, because this is completely useless in real world applications.
 [2007-08-17 22:19 UTC] hirokawa@php.net
This change is already back ported to PHP 5.2.
In my understanding, it shouldn't always applied to ISO-8859-9,
because the conversion result is depends on the locale.
(correct ?)




 [2007-08-23 16:33 UTC] jan at horde dot org
No. The conversion has to be done this way for iso-8859-9 always, not only if the current locale is Turkish. Turkish is the only language that uses this charset.
 [2007-08-23 23:03 UTC] jani@php.net
Feedback given.
 [2007-09-04 14:14 UTC] hirokawa@php.net
This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.


 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Wed Dec 04 03:01:30 2024 UTC