PHP :: Bug #60463 :: createcdatasection and ASCII \x10

Bug #60463

createcdatasection and ASCII \x10

Submitted:

2011-12-07 21:02 UTC

Modified:

2016-07-31 13:10 UTC

Votes:	1
Avg. Score:	5.0 ± 0.0
Reproduced:	1 of 1 (100.0%)
Same Version:	0 (0.0%)
Same OS:	0 (0.0%)

From:

jon at bigjjs dot com

Assigned:

cmb (profile)

Status:

Not a bug

Package:

DOM XML related

PHP Version:

Irrelevant

OS:

RHEL 6

Private report:

CVE-ID:

None

View Developer Edit

Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.

php.net Username: php.net Password:

Quick Fix:	(description)
	Block user comment
Status:		Assign to:
Package:
Bug Type:
Summary:
From:	jon at bigjjs dot com
New email:
PHP Version:		OS:

New/Additional Comment:

[2011-12-07 21:02 UTC] jon at bigjjs dot com

Description:
------------
DOMDocument::createCdataSection is leaving ascii control character \x10 which  
makes the resulting xml invalid.  It seems like this character should be handled 
by the function.  Otherwise it ahs to be escaped with:
$dom->createCDATASection(preg_replace('/[\x10]/', '', $value));

Test script:
---------------
header("Content-type: application/xml; charset=UTF-8");
$dom = new DOMDocument('1.0', 'UTF-8');
$e = $dom->createElement('element');
$dom->appendChild($e);
$e->appendChild($dom->createCDATASection("something\x10something"));
print $dom->saveXml();
exit();

Expected result:
----------------
I would expect clean xml.

Actual result:
--------------
Attempting to read it with DomDocument gives:
DOMDocument::loadXML() [<a href='domdocument.loadxml'>domdocument.loadxml</a>]: 
CData section not finished

Opening the above script in chrome gives:

This page contains the following errors:
error on line 2 at column 19: Input is not proper UTF-8, indicate encoding !
Bytes: 0x10 0x73 0x6F 0x6D

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports

[2016-07-31 13:10 UTC] cmb@php.net

-Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb

[2016-07-31 13:10 UTC] cmb@php.net

That is not a bug, but rather expected behavior. The specification
of Document::createCDATASection()[1] states:

| Creates a CDATASection node whose value is the specified string.

No special handling of illegal characters is mentioned there. The
documentation of "Interface CDATASection"[2] even says:

| Note that this may contain characters […] that, depending on the
| character encoding ("charset") chosen for serialization, it may
| be impossible to write out some characters as part of a CDATA
| section.

Also note that all ASCII control characters in the range [0x1,
0x1f] are valid for XML 1.1[3].

> Otherwise it ahs to be escaped with:

And that might be the reason why the specs don't define special
handling of illegal characters: simply because it's not clear what
should be done with those. Just removing them might not be desired
in all cases; there may be use-cases where these characters would
have to be substituted by other characters (e.g. by
REPLACEMENT_CHARACTER).

[1] <https://www.w3.org/TR/DOM-Level-3-Core/core.html#ID-D26C0AF8>
[2] <https://www.w3.org/TR/DOM-Level-3-Core/core.html#ID-667469212>
[3] <https://www.w3.org/TR/xml11/#charsets>

	php.net \| support \| documentation \| report a bug \| advanced search \| search howto \| statistics \| random bug \| login
go to bug id or search bugs for


Copyright © 2001-2025 The PHP Group All rights reserved.	Last updated: Wed Dec 31 18:00:01 2025 UTC