php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #37154 All strings input must be utf8-encoded
Submitted: 2006-04-21 14:37 UTC Modified: 2006-04-21 15:07 UTC
From: troelskn at gmail dot com Assigned:
Status: Not a bug Package: DOM XML related
PHP Version: 5.1.2 OS: *
Private report: No CVE-ID: None
View Add Comment Developer Edit
Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know!
Just going to say 'Me too!'? Don't clutter the database with that please !
Your email address:
MUST BE VALID
Solve the problem:
37 - 17 = ?
Subscribe to this entry?

 
 [2006-04-21 14:37 UTC] troelskn at gmail dot com
Description:
------------
After some digging around and experimentation, I have found out that the DOM extension needs all input strings to be utf8-encoded. This means that any code using the extension must be spingled with urf8_encode.

The problem can probably not be fixed without breaking backward compatibility, so the most sane choice may be to leave it, but atleast update the documentation to state this.


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2006-04-21 15:02 UTC] tony2001@php.net
Wrong.
The *default* input encoding is UTF8. But you can always use <?xml version="1.0" encoding="<your encoding>"?>.
All the result data are in UTF8 anyway, this is libxml2 feature.
 [2006-04-21 15:07 UTC] troelskn at gmail dot com
Not true.

$mb_detect_charsets = "ASCII,UTF-8,ISO-8859-1";




$dom = new DOMDocument("1.0", "UTF-8");
$doc = $dom->appendChild($dom->createElement("document"));
$doc->appendChild($dom->createTextNode(utf8_encode("I?t?rn?ti?n?liz?ti?n")));

echo mb_detect_encoding($dom->saveXML(), $mb_detect_charsets) . "<br>";




$dom = new DOMDocument("1.0", "ISO-8859-1");
$doc = $dom->appendChild($dom->createElement("document"));
$doc->appendChild($dom->createTextNode(utf8_encode("I?t?rn?ti?n?liz?ti?n")));

echo mb_detect_encoding($dom->saveXML(), $mb_detect_charsets) . "<br>";

-------------------------------------------------------
outputs :
UTF-8
ISO-8859-1
-------------------------------------------------------
Removing ut8_encode crashes the second example.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Apr 26 08:01:30 2024 UTC