php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #41000 Document DOM/libxml's handling of character encodings (UTF-8)
Submitted: 2007-04-05 01:39 UTC Modified: 2007-08-17 10:40 UTC
Votes:1
Avg. Score:3.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:0 (0.0%)
Same OS:0 (0.0%)
From: phpbug at elitecoders dot com Assigned:
Status: Closed Package: Documentation problem
PHP Version: Irrelevant OS: Irrelevant
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: phpbug at elitecoders dot com
New email:
PHP Version: OS:

 

 [2007-04-05 01:39 UTC] phpbug at elitecoders dot com
Description:
------------
I've been using PHP's DOM (both dom and domxml) for over a year and 
it's only today after a lot of research I've realized that because PHP 
uses libxml, and libxml uses UTF-8 internally that I have some serious 
issues with character encoding because of being ignorant of this.


I'm strongly suggesting a large note near the beginning of the DOM 
pages where knowing this information would be useful. In addition to 
the note, it might also be a good idea to link users to http://
xmlsoft.org/encoding.html where it goes over the whole utf8 thing and 
why they use it and blah blah blah.


Definitely a very visible note should be added here:
http://www.php.net/manual/en/ref.dom.php

On http://www.php.net/manual/en/function.dom-domdocument-construct.php
"The encoding of the document as part of the XML declaration." Here, 
it should be clarified that if you pass a character encoding, it does 
not affect the internal character encoding and you still need to 
utf8_encode characters outside of the normal ASCII range.


On http://www.php.net/manual/en/function.dom-domelement-construct.php 
and http://www.php.net/manual/en/function.dom-domtext-construct.php it 
needs to at least warn that characters outside of normal ascii need to 
be utf8 encoded.

I'm sure there are other pages as well, I need to get going though.

Reproduce code:
---------------
Not applicable

Expected result:
----------------
Not applicable

Actual result:
--------------
Not applicable

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2007-04-08 19:13 UTC] ezyang@php.net
While I think that the DOM documentation needs to be improved overall, this bug is covering a very specific issue, namely dealing with UTF-8 in libxml. I have changed the summary accordingly.
 [2007-08-17 10:40 UTC] vrana@php.net
This bug has been fixed in the documentation's XML sources. Since the
online and downloadable versions of the documentation need some time
to get updated, we would like to ask you to be a bit patient.

Thank you for the report, and for helping us make our documentation better.

"DOM extension uses UTF-8 encoding. Use utf8_encode()  and utf8_decode() to work with texts in ISO-8859-1 encoding or Iconv for other encodings."
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Fri Nov 21 05:00:01 2025 UTC