php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #60463 createcdatasection and ASCII \x10
Submitted: 2011-12-07 21:02 UTC Modified: 2016-07-31 13:10 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:0 (0.0%)
Same OS:0 (0.0%)
From: jon at bigjjs dot com Assigned: cmb (profile)
Status: Not a bug Package: DOM XML related
PHP Version: Irrelevant OS: RHEL 6
Private report: No CVE-ID: None
 [2011-12-07 21:02 UTC] jon at bigjjs dot com
Description:
------------
DOMDocument::createCdataSection is leaving ascii control character \x10 which  
makes the resulting xml invalid.  It seems like this character should be handled 
by the function.  Otherwise it ahs to be escaped with:
$dom->createCDATASection(preg_replace('/[\x10]/', '', $value));

Test script:
---------------
header("Content-type: application/xml; charset=UTF-8");
$dom = new DOMDocument('1.0', 'UTF-8');
$e = $dom->createElement('element');
$dom->appendChild($e);
$e->appendChild($dom->createCDATASection("something\x10something"));
print $dom->saveXml();
exit();

Expected result:
----------------
I would expect clean xml.

Actual result:
--------------
Attempting to read it with DomDocument gives:
DOMDocument::loadXML() [<a href='domdocument.loadxml'>domdocument.loadxml</a>]: 
CData section not finished

Opening the above script in chrome gives:

This page contains the following errors:
error on line 2 at column 19: Input is not proper UTF-8, indicate encoding !
Bytes: 0x10 0x73 0x6F 0x6D

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2016-07-31 13:10 UTC] cmb@php.net
-Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb
 [2016-07-31 13:10 UTC] cmb@php.net
That is not a bug, but rather expected behavior. The specification
of Document::createCDATASection()[1] states:

| Creates a CDATASection node whose value is the specified string.

No special handling of illegal characters is mentioned there. The
documentation of "Interface CDATASection"[2] even says:

| Note that this may contain characters […] that, depending on the
| character encoding ("charset") chosen for serialization, it may
| be impossible to write out some characters as part of a CDATA
| section.

Also note that all ASCII control characters in the range [0x1,
0x1f] are valid for XML 1.1[3].

> Otherwise it ahs to be escaped with:

And that might be the reason why the specs don't define special
handling of illegal characters: simply because it's not clear what
should be done with those. Just removing them might not be desired
in all cases; there may be use-cases where these characters would
have to be substituted by other characters (e.g. by
REPLACEMENT_CHARACTER).

[1] <https://www.w3.org/TR/DOM-Level-3-Core/core.html#ID-D26C0AF8>
[2] <https://www.w3.org/TR/DOM-Level-3-Core/core.html#ID-667469212>
[3] <https://www.w3.org/TR/xml11/#charsets>
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Apr 23 23:01:29 2024 UTC