php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #49731 Illegal XML characters allowed in (not stripped from) text nodes.
Submitted: 2009-10-01 06:47 UTC Modified: 2021-02-12 12:54 UTC
Votes:5
Avg. Score:4.2 ± 0.7
Reproduced:4 of 5 (80.0%)
Same Version:2 (50.0%)
Same OS:3 (75.0%)
From: moltendorf at gmail dot com Assigned: cmb (profile)
Status: Not a bug Package: DOM XML related
PHP Version: * OS: *
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If this is not your bug, you can add a comment by following this link.
If this is your bug, but you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: moltendorf at gmail dot com
New email:
PHP Version: OS:

 

 [2009-10-01 06:47 UTC] moltendorf at gmail dot com
Description:
------------
Adding illegal characters to a text node when saved as XML are not either converted to NCRs, or stripped. This breaks the XML, causing it to be "not well-formed" when viewed by standards-compliant browsers, like Firefox.

Reproduce code:
---------------
<?php
header ('Content-Type: text/xml');
$document = new DOMDocument ('1.0', 'utf-8');
$element = $document -> createElement ('element');
$text = $document -> createTextNode (''); // Please copy and paste this line directly; it contains special characters that may not display correctly; and are invalid in XML based on the W3C specification.
$element -> appendChild ($text);
$document -> appendChild ($element);
echo $document -> saveXML ( );

Expected result:
----------------
<?xml version="1.0" encoding="utf-8"?>
<element/>

Actual result:
--------------
<?xml version="1.0" encoding="utf-8"?>
<element></element>

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-10-01 06:51 UTC] moltendorf at gmail dot com
Adding illegal characters to a text node when saved as XML are not stripped. This breaks the XML, causing it to be "not well-formed" when viewed by standards-compliant browsers, like Firefox. Even converting them to NCRs breaks the XML in most browsers.

For reference, the NCR of the "illegal character" included in the reproduce code is &#6;
 [2009-10-01 07:00 UTC] moltendorf at gmail dot com
Made title more correct, and the last comment was intended to replace the existing description for this bug; as invalid characters should be stripped entirely, or an exception should be thrown (and the text node not be added), and/or some sort of method should be included to strip the illegal characters from the input without throwing an exception.
 [2014-07-13 02:23 UTC] yohgaki@php.net
-Type: Bug +Type: Feature/Change Request -PHP Version: 5.2.11 +PHP Version: *
 [2014-07-13 02:23 UTC] yohgaki@php.net
Numeric Character Reference (NCR) or any conversion breaks apps.
 [2021-02-12 12:54 UTC] cmb@php.net
-Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb
 [2021-02-12 12:54 UTC] cmb@php.net
Indeed, most ASCII control characters (such as ACK in this case)
are invalid in XML 1.0, but allowed in XML 1.1.  Apparently,
libxml2 doesn't properly distinguish between both version, but
that would be an upstream issue.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Apr 18 20:01:30 2024 UTC