php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #37154 All strings input must be utf8-encoded
Submitted: 2006-04-21 14:37 UTC Modified: 2006-04-21 15:07 UTC
From: troelskn at gmail dot com Assigned:
Status: Not a bug Package: DOM XML related
PHP Version: 5.1.2 OS: *
Private report: No CVE-ID: None
 [2006-04-21 14:37 UTC] troelskn at gmail dot com
Description:
------------
After some digging around and experimentation, I have found out that the DOM extension needs all input strings to be utf8-encoded. This means that any code using the extension must be spingled with urf8_encode.

The problem can probably not be fixed without breaking backward compatibility, so the most sane choice may be to leave it, but atleast update the documentation to state this.


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2006-04-21 15:02 UTC] tony2001@php.net
Wrong.
The *default* input encoding is UTF8. But you can always use <?xml version="1.0" encoding="<your encoding>"?>.
All the result data are in UTF8 anyway, this is libxml2 feature.
 [2006-04-21 15:07 UTC] troelskn at gmail dot com
Not true.

$mb_detect_charsets = "ASCII,UTF-8,ISO-8859-1";




$dom = new DOMDocument("1.0", "UTF-8");
$doc = $dom->appendChild($dom->createElement("document"));
$doc->appendChild($dom->createTextNode(utf8_encode("I?t?rn?ti?n?liz?ti?n")));

echo mb_detect_encoding($dom->saveXML(), $mb_detect_charsets) . "<br>";




$dom = new DOMDocument("1.0", "ISO-8859-1");
$doc = $dom->appendChild($dom->createElement("document"));
$doc->appendChild($dom->createTextNode(utf8_encode("I?t?rn?ti?n?liz?ti?n")));

echo mb_detect_encoding($dom->saveXML(), $mb_detect_charsets) . "<br>";

-------------------------------------------------------
outputs :
UTF-8
ISO-8859-1
-------------------------------------------------------
Removing ut8_encode crashes the second example.
 
PHP Copyright © 2001-2020 The PHP Group
All rights reserved.
Last updated: Wed Dec 02 21:01:25 2020 UTC