|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
[2011-12-07 21:02 UTC] jon at bigjjs dot com
Description:
------------
DOMDocument::createCdataSection is leaving ascii control character \x10 which
makes the resulting xml invalid. It seems like this character should be handled
by the function. Otherwise it ahs to be escaped with:
$dom->createCDATASection(preg_replace('/[\x10]/', '', $value));
Test script:
---------------
header("Content-type: application/xml; charset=UTF-8");
$dom = new DOMDocument('1.0', 'UTF-8');
$e = $dom->createElement('element');
$dom->appendChild($e);
$e->appendChild($dom->createCDATASection("something\x10something"));
print $dom->saveXml();
exit();
Expected result:
----------------
I would expect clean xml.
Actual result:
--------------
Attempting to read it with DomDocument gives:
DOMDocument::loadXML() [<a href='domdocument.loadxml'>domdocument.loadxml</a>]:
CData section not finished
Opening the above script in chrome gives:
This page contains the following errors:
error on line 2 at column 19: Input is not proper UTF-8, indicate encoding !
Bytes: 0x10 0x73 0x6F 0x6D
PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
|
|||||||||||||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Sat Nov 01 02:00:01 2025 UTC |
That is not a bug, but rather expected behavior. The specification of Document::createCDATASection()[1] states: | Creates a CDATASection node whose value is the specified string. No special handling of illegal characters is mentioned there. The documentation of "Interface CDATASection"[2] even says: | Note that this may contain characters […] that, depending on the | character encoding ("charset") chosen for serialization, it may | be impossible to write out some characters as part of a CDATA | section. Also note that all ASCII control characters in the range [0x1, 0x1f] are valid for XML 1.1[3]. > Otherwise it ahs to be escaped with: And that might be the reason why the specs don't define special handling of illegal characters: simply because it's not clear what should be done with those. Just removing them might not be desired in all cases; there may be use-cases where these characters would have to be substituted by other characters (e.g. by REPLACEMENT_CHARACTER). [1] <https://www.w3.org/TR/DOM-Level-3-Core/core.html#ID-D26C0AF8> [2] <https://www.w3.org/TR/DOM-Level-3-Core/core.html#ID-667469212> [3] <https://www.w3.org/TR/xml11/#charsets>