php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #49731 Illegal XML characters allowed in (not stripped from) text nodes.
Submitted: 2009-10-01 06:47 UTC Modified: 2021-02-12 12:54 UTC
Votes:5
Avg. Score:4.2 ± 0.7
Reproduced:4 of 5 (80.0%)
Same Version:2 (50.0%)
Same OS:3 (75.0%)
From: moltendorf at gmail dot com Assigned: cmb (profile)
Status: Not a bug Package: DOM XML related
PHP Version: * OS: *
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: moltendorf at gmail dot com
New email:
PHP Version: OS:

 

 [2009-10-01 06:47 UTC] moltendorf at gmail dot com
Description:
------------
Adding illegal characters to a text node when saved as XML are not either converted to NCRs, or stripped. This breaks the XML, causing it to be "not well-formed" when viewed by standards-compliant browsers, like Firefox.

Reproduce code:
---------------
<?php
header ('Content-Type: text/xml');
$document = new DOMDocument ('1.0', 'utf-8');
$element = $document -> createElement ('element');
$text = $document -> createTextNode (''); // Please copy and paste this line directly; it contains special characters that may not display correctly; and are invalid in XML based on the W3C specification.
$element -> appendChild ($text);
$document -> appendChild ($element);
echo $document -> saveXML ( );

Expected result:
----------------
<?xml version="1.0" encoding="utf-8"?>
<element/>

Actual result:
--------------
<?xml version="1.0" encoding="utf-8"?>
<element></element>

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-10-01 06:51 UTC] moltendorf at gmail dot com
Adding illegal characters to a text node when saved as XML are not stripped. This breaks the XML, causing it to be "not well-formed" when viewed by standards-compliant browsers, like Firefox. Even converting them to NCRs breaks the XML in most browsers.

For reference, the NCR of the "illegal character" included in the reproduce code is &#6;
 [2009-10-01 07:00 UTC] moltendorf at gmail dot com
Made title more correct, and the last comment was intended to replace the existing description for this bug; as invalid characters should be stripped entirely, or an exception should be thrown (and the text node not be added), and/or some sort of method should be included to strip the illegal characters from the input without throwing an exception.
 [2014-07-13 02:23 UTC] yohgaki@php.net
-Type: Bug +Type: Feature/Change Request -PHP Version: 5.2.11 +PHP Version: *
 [2014-07-13 02:23 UTC] yohgaki@php.net
Numeric Character Reference (NCR) or any conversion breaks apps.
 [2021-02-12 12:54 UTC] cmb@php.net
-Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb
 [2021-02-12 12:54 UTC] cmb@php.net
Indeed, most ASCII control characters (such as ACK in this case)
are invalid in XML 1.0, but allowed in XML 1.1.  Apparently,
libxml2 doesn't properly distinguish between both version, but
that would be an upstream issue.
 
PHP Copyright © 2001-2021 The PHP Group
All rights reserved.
Last updated: Mon Mar 01 19:01:24 2021 UTC