|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #37154 All strings input must be utf8-encoded
Submitted: 2006-04-21 14:37 UTC Modified: 2006-04-21 15:07 UTC
From: troelskn at gmail dot com Assigned:
Status: Not a bug Package: DOM XML related
PHP Version: 5.1.2 OS: *
Private report: No CVE-ID: None
 [2006-04-21 14:37 UTC] troelskn at gmail dot com
After some digging around and experimentation, I have found out that the DOM extension needs all input strings to be utf8-encoded. This means that any code using the extension must be spingled with urf8_encode.

The problem can probably not be fixed without breaking backward compatibility, so the most sane choice may be to leave it, but atleast update the documentation to state this.


Pull Requests


AllCommentsChangesGit/SVN commitsRelated reports
 [2006-04-21 15:02 UTC]
The *default* input encoding is UTF8. But you can always use <?xml version="1.0" encoding="<your encoding>"?>.
All the result data are in UTF8 anyway, this is libxml2 feature.
 [2006-04-21 15:07 UTC] troelskn at gmail dot com
Not true.

$mb_detect_charsets = "ASCII,UTF-8,ISO-8859-1";

$dom = new DOMDocument("1.0", "UTF-8");
$doc = $dom->appendChild($dom->createElement("document"));

echo mb_detect_encoding($dom->saveXML(), $mb_detect_charsets) . "<br>";

$dom = new DOMDocument("1.0", "ISO-8859-1");
$doc = $dom->appendChild($dom->createElement("document"));

echo mb_detect_encoding($dom->saveXML(), $mb_detect_charsets) . "<br>";

outputs :
Removing ut8_encode crashes the second example.
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Wed Feb 05 13:01:33 2025 UTC