php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #60463 createcdatasection and ASCII \x10
Submitted: 2011-12-07 21:02 UTC Modified: 2016-07-31 13:10 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:0 (0.0%)
Same OS:0 (0.0%)
From: jon at bigjjs dot com Assigned: cmb (profile)
Status: Not a bug Package: DOM XML related
PHP Version: Irrelevant OS: RHEL 6
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: jon at bigjjs dot com
New email:
PHP Version: OS:

 

 [2011-12-07 21:02 UTC] jon at bigjjs dot com
Description:
------------
DOMDocument::createCdataSection is leaving ascii control character \x10 which  
makes the resulting xml invalid.  It seems like this character should be handled 
by the function.  Otherwise it ahs to be escaped with:
$dom->createCDATASection(preg_replace('/[\x10]/', '', $value));

Test script:
---------------
header("Content-type: application/xml; charset=UTF-8");
$dom = new DOMDocument('1.0', 'UTF-8');
$e = $dom->createElement('element');
$dom->appendChild($e);
$e->appendChild($dom->createCDATASection("something\x10something"));
print $dom->saveXml();
exit();

Expected result:
----------------
I would expect clean xml.

Actual result:
--------------
Attempting to read it with DomDocument gives:
DOMDocument::loadXML() [<a href='domdocument.loadxml'>domdocument.loadxml</a>]: 
CData section not finished

Opening the above script in chrome gives:

This page contains the following errors:
error on line 2 at column 19: Input is not proper UTF-8, indicate encoding !
Bytes: 0x10 0x73 0x6F 0x6D

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2016-07-31 13:10 UTC] cmb@php.net
-Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb
 [2016-07-31 13:10 UTC] cmb@php.net
That is not a bug, but rather expected behavior. The specification
of Document::createCDATASection()[1] states:

| Creates a CDATASection node whose value is the specified string.

No special handling of illegal characters is mentioned there. The
documentation of "Interface CDATASection"[2] even says:

| Note that this may contain characters […] that, depending on the
| character encoding ("charset") chosen for serialization, it may
| be impossible to write out some characters as part of a CDATA
| section.

Also note that all ASCII control characters in the range [0x1,
0x1f] are valid for XML 1.1[3].

> Otherwise it ahs to be escaped with:

And that might be the reason why the specs don't define special
handling of illegal characters: simply because it's not clear what
should be done with those. Just removing them might not be desired
in all cases; there may be use-cases where these characters would
have to be substituted by other characters (e.g. by
REPLACEMENT_CHARACTER).

[1] <https://www.w3.org/TR/DOM-Level-3-Core/core.html#ID-D26C0AF8>
[2] <https://www.w3.org/TR/DOM-Level-3-Core/core.html#ID-667469212>
[3] <https://www.w3.org/TR/xml11/#charsets>
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Wed Apr 24 19:01:31 2024 UTC