php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #49731 Illegal XML characters allowed in (not stripped from) text nodes.
Submitted: 2009-10-01 06:47 UTC Modified: 2014-07-13 02:23 UTC
Votes:5
Avg. Score:4.2 ± 0.7
Reproduced:4 of 5 (80.0%)
Same Version:2 (50.0%)
Same OS:3 (75.0%)
From: moltendorf at gmail dot com Assigned:
Status: Open Package: DOM XML related
PHP Version: * OS: *
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: moltendorf at gmail dot com
New email:
PHP Version: OS:

 

 [2009-10-01 06:47 UTC] moltendorf at gmail dot com
Description:
------------
Adding illegal characters to a text node when saved as XML are not either converted to NCRs, or stripped. This breaks the XML, causing it to be "not well-formed" when viewed by standards-compliant browsers, like Firefox.

Reproduce code:
---------------
<?php
header ('Content-Type: text/xml');
$document = new DOMDocument ('1.0', 'utf-8');
$element = $document -> createElement ('element');
$text = $document -> createTextNode (''); // Please copy and paste this line directly; it contains special characters that may not display correctly; and are invalid in XML based on the W3C specification.
$element -> appendChild ($text);
$document -> appendChild ($element);
echo $document -> saveXML ( );

Expected result:
----------------
<?xml version="1.0" encoding="utf-8"?>
<element/>

Actual result:
--------------
<?xml version="1.0" encoding="utf-8"?>
<element></element>

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-10-01 06:51 UTC] moltendorf at gmail dot com
Adding illegal characters to a text node when saved as XML are not stripped. This breaks the XML, causing it to be "not well-formed" when viewed by standards-compliant browsers, like Firefox. Even converting them to NCRs breaks the XML in most browsers.

For reference, the NCR of the "illegal character" included in the reproduce code is &#6;
 [2009-10-01 07:00 UTC] moltendorf at gmail dot com
Made title more correct, and the last comment was intended to replace the existing description for this bug; as invalid characters should be stripped entirely, or an exception should be thrown (and the text node not be added), and/or some sort of method should be included to strip the illegal characters from the input without throwing an exception.
 [2014-07-13 02:23 UTC] yohgaki@php.net
-Type: Bug +Type: Feature/Change Request -PHP Version: 5.2.11 +PHP Version: *
 [2014-07-13 02:23 UTC] yohgaki@php.net
Numeric Character Reference (NCR) or any conversion breaks apps.
 
PHP Copyright © 2001-2020 The PHP Group
All rights reserved.
Last updated: Mon Jan 27 12:01:24 2020 UTC