|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #40433 SimpleXML enforces wrong charset
Submitted: 2007-02-10 22:07 UTC Modified: 2007-02-10 22:32 UTC
From: consensus at gmail dot com Assigned:
Status: Not a bug Package: SimpleXML related
PHP Version: 5.2.1 OS: any
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
Block user comment
Status: Assign to:
Bug Type:
From: consensus at gmail dot com
New email:
PHP Version: OS:


 [2007-02-10 22:07 UTC] consensus at gmail dot com
SimpleXML has a wrong behaviour (yes it is defined, but still wrong) regarding the charset.

SimpleXML is able to read any xml file as long as the charset is given in the xml header (encoding="...")
But when it gives back values it does not recognize the encoding anymore and forces utf-8.

While this might be defined behaviour it is still a very unclean/ignorant feature.
It is the only function i know which behaves like this.

Charsets do have their good reason.
and ofcourse you can convert each value you get into the correct charset.
But here why this is generally a bad idea:
a) Everyone expects the function to not change the charset.
b) This is a big waste of cputime.
   If you have millions of values you have millions of 
   function calls to reconvert them all!
   Depending on the application you write this can get a 
   real problem.

I suggest to rethink about this design decision as it will be a problem for many others too over the time.


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2007-02-10 22:32 UTC]
Yes, it's defined behaviour and will stay that way.

About the "waste of cpu cycles". The  library used for 
SimpleXML (and all other XML extensions) is libxml2 and this 
library does convert anything internally to UTF-8, regardless 
what the input is.

So, you have to convert it back to whatever charset you want 
in your PHP script
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Apr 19 05:01:29 2024 UTC