php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #40433 SimpleXML enforces wrong charset
Submitted: 2007-02-10 22:07 UTC Modified: 2007-02-10 22:32 UTC
From: consensus at gmail dot com Assigned:
Status: Not a bug Package: SimpleXML related
PHP Version: 5.2.1 OS: any
Private report: No CVE-ID: None
 [2007-02-10 22:07 UTC] consensus at gmail dot com
Description:
------------
SimpleXML has a wrong behaviour (yes it is defined, but still wrong) regarding the charset.

SimpleXML is able to read any xml file as long as the charset is given in the xml header (encoding="...")
But when it gives back values it does not recognize the encoding anymore and forces utf-8.

While this might be defined behaviour it is still a very unclean/ignorant feature.
It is the only function i know which behaves like this.

Charsets do have their good reason.
and ofcourse you can convert each value you get into the correct charset.
But here why this is generally a bad idea:
a) Everyone expects the function to not change the charset.
b) This is a big waste of cputime.
   If you have millions of values you have millions of 
   function calls to reconvert them all!
   Depending on the application you write this can get a 
   real problem.


I suggest to rethink about this design decision as it will be a problem for many others too over the time.


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2007-02-10 22:32 UTC] chregu@php.net
Yes, it's defined behaviour and will stay that way.

About the "waste of cpu cycles". The  library used for 
SimpleXML (and all other XML extensions) is libxml2 and this 
library does convert anything internally to UTF-8, regardless 
what the input is.

So, you have to convert it back to whatever charset you want 
in your PHP script
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Wed Dec 04 18:01:31 2024 UTC