php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #29955 Support for Turkish/iso-8859-9
Submitted: 2004-09-02 17:15 UTC Modified: 2007-09-04 14:14 UTC
Votes:2
Avg. Score:4.5 ± 0.5
Reproduced:2 of 2 (100.0%)
Same Version:1 (50.0%)
Same OS:0 (0.0%)
From: jan at horde dot org Assigned: hirokawa (profile)
Status: Closed Package: mbstring related
PHP Version: 5CVS, 4CVS (2004-09-02) OS: Linux
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: jan at horde dot org
New email:
PHP Version: OS:

 

 [2004-09-02 17:15 UTC] jan at horde dot org
Description:
------------
In ISO-8859-9 (Turkish) the uppercase letter of "i" is a dotted uppercase "I", the lowercase letter of "I" is a dotless "i". But mb_strtolower() und mb_strtoupper() simply return the ASCII uppercase or lowercase counterparts.

You get the correct result with:
setlocale(LC_ALL, 'tr_TR');
echo strtoupper('i');
echo strtolower('I');

But the wrong results with:
echo mb_strtoupper('i', 'iso-8859-9');
echo mb_strtolower('I', 'iso-8859-9');



Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2005-02-22 11:10 UTC] moriyoshi@php.net
It turned out this is because mbstring doesn't take the 
locale into consideration.


 [2005-05-13 02:26 UTC] mustafa at deu dot edu dot tr
I get the same results like jan.

I need to get UTF-8 output for consuming a web service and I configured my php 5.0.4 with --enable-mbstring=all parameter (on linux that has been set with Turkish locale)

I see that mbstring extension has limited language support in source code. (German, English, Japanese, Korean, Russian, Chinese)

Is there a way to add our (Turkish) language to source code? Any references about this extension's source?
 [2005-05-13 08:00 UTC] moriyoshi@php.net
Turkish locale would need complete overhaul on the 
entire extension because the locale's character 
properties and required case folding behaviour are very 
special.

PHP-ICU extension could support anything, but that's 
just an ongoing work by l0t3k.

 [2005-12-23 14:10 UTC] hirokawa@php.net
I don't know which is the standard way (0x49 or 0xdd).
In ISO-8859-9 (Turkish),
upper case of 'i' (0x69) always should be translated to 'I' 
with dot (0xdd) ?
If yes, please let me know some URLs which describe 
the mapping.


 [2005-12-23 14:24 UTC] jan at horde dot org
See http://www.gymel.com/charsets/ISO8859-9.html#U0069 and http://www.gymel.com/charsets/ISO8859-9.html#U0049 under "Bemerkungen:" (remarks).
 [2005-12-23 14:28 UTC] derick@php.net
"man iso-8859-9" will tell you.

"i" maps to "0xdd"
and
"0xfd" maps to "I"

See also:
http://www.eki.ee/letter/chardata.cgi?lang=tr+Turkish&script=latin
 [2005-12-23 14:56 UTC] hirokawa@php.net
Please try using this CVS snapshot:

  http://snaps.php.net/php6.0-latest.tar.gz
 
For Windows:
 
  http://snaps.php.net/win32/php6.0-win32-latest.zip

Turkish language support is added in CVS HEAD.
When mbstring.language = Turkish,
Turkish case filding will be performed in ISO-8859-9.
(upper:0x69 -> 0xdd, lower:0x49->0xfd)
Otherwise, normal case folding is performed.
(upper:0x69 -> 0x49, lower:0x49->0x69)

 [2007-01-05 14:31 UTC] jan at horde dot org
Any chance this is going to be backported to PHP 5.2? I guess mbstring is going to be obsolete with the Unicode and ICU support in PHP 6.
 [2007-01-05 14:33 UTC] jan at horde dot org
Oh, and by the way, this conversion should always happen for iso-8859-9, not only if mbstring.language is set to Turkish, because this is completely useless in real world applications.
 [2007-08-17 22:19 UTC] hirokawa@php.net
This change is already back ported to PHP 5.2.
In my understanding, it shouldn't always applied to ISO-8859-9,
because the conversion result is depends on the locale.
(correct ?)




 [2007-08-23 16:33 UTC] jan at horde dot org
No. The conversion has to be done this way for iso-8859-9 always, not only if the current locale is Turkish. Turkish is the only language that uses this charset.
 [2007-08-23 23:03 UTC] jani@php.net
Feedback given.
 [2007-09-04 14:14 UTC] hirokawa@php.net
This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.


 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 10:01:29 2024 UTC