php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #22003 XML parsing and strtoupper broken in Turkish
Submitted: 2003-02-01 18:23 UTC Modified: 2003-03-02 13:24 UTC
From: spud at nothingness dot org Assigned:
Status: Not a bug Package: XML related
PHP Version: 4.3.0 OS: Linux Redhat 7.2
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: spud at nothingness dot org
New email:
PHP Version: OS:

 

 [2003-02-01 18:23 UTC] spud at nothingness dot org
I was trying to do some XML-RPC stuff in PHP while my locale was set to 'tr_TR.utf8'. I (and others) have reported other bugs related to Turkish and PHP because of the odd Turkish relationship with the letter "i".

While issues related to the lower-casing of object classes have been resolved in 4.3.0, I encountered similar problems with XML parsing.

Specifically, when the locale is set to "tr_TR.utf8' and an xml parser is created with CASE_FOLDING enabled, "INT" tags and "STRING" tags are labels as "iNT" and "STRiNG". Consequently, functions designed to recognize tag names based on all uppercase letters fail to recognize these tags.

The problem is also evident in the basic strtoupper function built into PHP. The following code demonstrates both examples:

<?
putenv("LANG=tr_TR.utf8"); 
$chk = setlocale(LC_ALL,'tr_TR.utf8');
if ($chk) echo ("Setting language to Turkish<br>\n");

$x = "<string>foo</string>";
echo ("x is ".htmlspecialchars($x,ENT_COMPAT,'utf-8')."<br>\n");
$y = strtoupper($x);
echo ("Strtoupper yields ".htmlspecialchars($y,ENT_COMPAT,'utf-8')."<br>\n");

function startElement($parser, $name, $attrs) {
    print "Start tag name: $name<br>\n";
}

function endElement($parser, $name) {
    print "End tag name: $name<br>\n";
}
function charData($parser, $data) {
    print "Character Data: $data<br>\n";
}

$parser = xml_parser_create('utf-8');
xml_parser_set_option($parser,XML_OPTION_CASE_FOLDING,true);
xml_set_element_handler($parser, "startElement", "endElement");
xml_set_character_data_handler($parser, "charData"); 
echo ("Parsing with utf-8 parser, case_folding enabled<br>\n");
xml_parse($parser,$x);
xml_parser_free($parser);
?>

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2003-03-02 13:24 UTC] moriyoshi@php.net
According to the unicode specification, case folding behaviour varies by locale settings. So this is not a bug.

See http://www.unicode.org/reports/tr21/tr21-3.html for detail.

Also related to the following patch on i18n part of glibc source:

http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/localedata/locales/tr_TR.diff?r1=1.12&r2=1.13&cvsroot=glibc

 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Apr 19 03:01:27 2024 UTC