php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #40105 DOMDocument->createElement() unescapes numeric character references in values
Submitted: 2007-01-11 23:26 UTC Modified: 2007-01-19 06:17 UTC
From: fletch at pobox dot com Assigned:
Status: Not a bug Package: DOM XML related
PHP Version: 5.2.0 OS: linux
Private report: No CVE-ID: None
 [2007-01-11 23:26 UTC] fletch at pobox dot com
Description:
------------
DOMDocument->createElement() unescapes any numeric character references (NCRs) contained in the value passed to its optional second parameter.  It should preserve the value as passed.

Reproduce code:
---------------
<?php
$dom = new DOMDocument( "1.0", 'UTF-8' );
$dom->appendChild( $dom->createElement( 'root', '&amp;&#0160;' ) );
var_dump( trim( $dom->saveXML() ) );
?>

Expected result:
----------------
string(59) "<?xml version="1.0" encoding="UTF-8"?>
<root>&amp;&#0160;</root>"

Actual result:
--------------
string(59) "<?xml version="1.0" encoding="UTF-8"?>
<root>&amp; </root>"

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2007-01-11 23:28 UTC] fletch at pobox dot com
There's a trivial error in the expected result text: the string length produced by var_dump() should be 65, not 59.
 [2007-01-12 07:44 UTC] chregu@php.net
It was  decided to leave it as it is (for backwards 
compatibility reasons)

use $dom->createTextNode() for the behaviour you want.
 [2007-01-18 06:01 UTC] fletch at pobox dot com
I'm aware of the differences in escaping between createElement() and createTextNode() and neither one does what I describe.  createElement() unescapes NCRs (as I described in the original description) while createTextNode() escapes values.

In a nutshell, using createElement() I end up with "<root> </root>", and using createTextNode() I end up with "<root>&amp;#0160;</root>".

There is no way to generate the literal XML "<root>&#0160;</root>" using either createElement() OR createTextNode()
 [2007-01-18 11:05 UTC] chregu@php.net
<root>&#0160;</root> and <root> </root> (the space here being 
a non breakable space) is the same in the XML context... It 
doesn't matter, if your write a character as numerical entity 
or with the actual char. It's the same. You have to live with 
that.

And please don't just assign bugs...
 [2007-01-18 17:04 UTC] fletch at pobox dot com
Thanks for the quick response.  I did change the bug status back to "open" but I didn't assign it, tony2001@php.net did.

I believe you're right that this particular example both XML strings are the same, but only because the NCR I used represents a printable character.  A non-printable character must be represented as its NCR.  There's no way to craft a node whose value contains an NCR, and as such, no way to create an element whose value contains non-printable characters.
 [2007-01-19 06:17 UTC] chregu@php.net
$dom = new DOMDocument( "1.0", 'UTF-8' );
$dom->appendChild( $dom->createElement( 'root','&#13;' ));
var_dump( trim( $dom->saveXML() ) );

prints out

string(57) "<?xml version="1.0" encoding="UTF-8"?>
<root>&#13;</root>"


Anyway, this is really not a PHP issue (if it is one at 
all), but a libxml2 issue. If you still think this is a bug, 
report it there



 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Nov 22 22:01:30 2024 UTC