php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #28154 Simple Xml output only utf-8
Submitted: 2004-04-26 11:36 UTC Modified: 2004-09-13 08:12 UTC
Votes:2
Avg. Score:5.0 ± 0.0
Reproduced:2 of 2 (100.0%)
Same Version:2 (100.0%)
Same OS:1 (50.0%)
From: jay at kuantic dot com Assigned:
Status: Not a bug Package: *XML functions
PHP Version: 5.0.0RC2 OS: *
Private report: No CVE-ID: None
 [2004-04-26 11:36 UTC] jay at kuantic dot com
Description:
------------
Simple Xml seems to output only utf-8. No matter how specified in encoding='iso-8859-1' and not simple function to change encodage.

Reproduce code:
---------------
$xmlstr = <<<XML
<?xml version='1.0' encoding='iso-8859-1'?>
<root>
	<section>
		<title>Gestion des Objets</title>
		<title>Gestion des G?olocalisation</title>
		<title>Autres questions...</title>
	</section>
</root>
XML;

$xml = simplexml_load_string($xmlstr);
foreach ($xml->section->title as $title) {
echo $title, '<br />';
}
?>

Expected result:
----------------
Gestion des Objets
Gestion des G?olocalisation
Autres questions...

Actual result:
--------------
Gestion des Objets
Gestion des Géolocalisation
Autres questions...

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2004-04-26 22:05 UTC] helly@php.net
See also #28169: SimpleXML not parsing scandinavian characters correctly.
 [2004-06-03 19:41 UTC] jerome dot wagner at oreka dot com
OS is not given in the bug.
I reproduced the problem on windows XP - php5RC2
 [2004-06-07 09:42 UTC] chregu@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

use the iconv  or utf8_decode function to convert to 
your desired charset after getting the values from 
simplexml.
 [2004-09-11 19:07 UTC] bob dot siefkes at nec-computers dot com
It might not be seen as a bug. For me it really behaves differently then expected. For instance when using xsltproc it keeps the same character set (latin1 in my case). Especially as the data should be stored in a MySQL record and knowing that MySQL will not (yet) support UTF8.
Because SimpleXML is not keeping latin1 I found out that utf8_decode (and also iconv) cannot convert the EURO sign properly. The EURO sign is character code 128 decimal for latin 1 (=iso-8859-1). I'm using php 5.0.1 on IIS5.
 [2004-09-13 08:12 UTC] derick@php.net
The Euro sign doesn't exist in iso-8859-1, but only in iso-8859-15. Some crappy windows code page puts it on chr(128).
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Mar 19 07:01:29 2024 UTC