php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #31613 ampersands (&) not being translated to entities when setting nodeValue
Submitted: 2005-01-19 17:39 UTC Modified: 2005-01-21 01:42 UTC
Votes:2
Avg. Score:4.0 ± 1.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: brian dot sanders at cometsystems dot com Assigned:
Status: Not a bug Package: XML related
PHP Version: 5.0.2 OS: Fedora Core 3
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: brian dot sanders at cometsystems dot com
New email:
PHP Version: OS:

 

 [2005-01-19 17:39 UTC] brian dot sanders at cometsystems dot com
Description:
------------
We have encountered the following bug when using libxml2 (via PHP5):

When we set the value of a text node, ampersands (&) are not being converted to XML entities.  These raw characters truncate our text node, resulting in lost data.

For example, after setting nodeValues, less-than's (<) show up as entities (&lt;) in our XML file.  However, ampersands (&) show up as raw characters (&). 

Note that in the example code we have silenced the output from setting the nodeValues, as this generates a PHP warning when used with newer versions of libxml2.

We have experienced this bug on at least two machines, with the following configurations:

MACHINE 1:
 LIBXML2: 2.6.16-3 (binary rpm)
 PHP: 5.0.2 (built from source)
 OS: Fedora Core Linux 3
 KERNEL: 2.6.9-1.681_FC3smp

MACHINE 2:
 LIBXML2: 2.6.7-28.4 (binary)
 PHP: 5.0.2 (built from source)
 OS: Suse Linux 9.1
 KERNEL: 2.6.4-52-default 

Reproduce code:
---------------
<?php

/* load the document */
$dom = DomDocument::loadXML(<<<XML
<?xml version="1.0"?>
<test>
 <foo>Here is an &lt; to test.</foo>
 <bar>Here is an &amp; to test.</bar>
</test>
XML
);

/* confirm we have the document verbatim */
print("HERE IS THE INITIAL DOCUMENT:\n\n");
print($dom->saveXML());
print("\n--------------------------------\n\n");

/* get the LT node */
$nodeList = $dom->getElementsByTagName('foo');
$nodeLT = $nodeList->item(0);

/* get the AMP node */
$nodeList = $dom->getElementsByTagName('bar');
$nodeAMP = $nodeList->item(0);

/* resave the node values */
@$nodeLT->nodeValue = $nodeLT->nodeValue;
@$nodeAMP->nodeValue = $nodeAMP->nodeValue;

/* show the altered document */
print("HERE IS THE DOCUMENT AFTER THE SAVE:\n\n");
print($dom->saveXML());
print("\n--------------------------------\n\n");

?>


Expected result:
----------------
HERE IS THE INITIAL DOCUMENT:

<?xml version="1.0"?>
<test>
 <foo>Here is an &lt; to test.</foo>
 <bar>Here is an &amp; to test.</bar>
</test>

--------------------------------

HERE IS THE DOCUMENT AFTER THE SAVE:

<?xml version="1.0"?>

<test>
 <foo>Here is an &lt; to test.</foo>
 <bar>Here is an &amp; to test.</bar>
</test>


Actual result:
--------------
HERE IS THE INITIAL DOCUMENT:

<?xml version="1.0"?>
<test>
 <foo>Here is an &lt; to test.</foo>
 <bar>Here is an &amp; to test.</bar>
</test>

--------------------------------

HERE IS THE DOCUMENT AFTER THE SAVE:

<?xml version="1.0"?>

<test>
 <foo>Here is an &lt; to test.</foo>
 <bar>Here is an </bar>
</test>


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2005-01-20 12:54 UTC] rrichards@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

You have to escape the value when writing to the nodeValue property in this case. This property is made available to elements as a convienience - the specs actually state that this property has no meaning for an element node. So, escaping must be done for an & so that entities could be used in the "value" of an element (Note the warning you get when you disable error supression)
 [2005-01-20 22:51 UTC] brian dot sanders at cometsystems dot com
This is either a bug in PHP's DOM implementation, the underlying libxml2 interface, or the PHP documentation.  

The W3C spec is vague on the use of the property nodeValue:

"The attributes nodeName, nodeValue and attributes are included as a mechanism to get at node information without casting down to the specific derived interface."

However, what is the purpose of nodeValue besides getting and setting the value of a child text node?  In these situations, all special characters, including ampersands should be encoded by the interface.  Encoding them manually breaks a basic contract of DOM/XML/XSL, which states that each transport layer does not need to worry about the encoding of other transport layers.

On the other hand, ampersands ARE properly encoded when setting the property textContent.  Unfortunately they are not encoded when the text string is passed as the optional second arguement to DOMElement::createElement.  The example in the manual (http://us4.php.net/manual/en/function.dom-domdocument-createelement.php) explicitely sets the text value of an element using this unsafe method.  And to further complicate the issue, setting the textContent property on a node created by DOMElement::createElement() does NOT work.  You must create a text node, set the textContent, then append the text node to the new element.

So in summary:
* Setting the textContent property encodes ampersands as &amp;
* Setting the nodeValue property does not encode ampersands, but instead truncates the string (silently with some versions of libxml2.)
* The manual suggests using the optional second argument to DOMElement::createElement() to set the value of a node.
* The manual also suggests using the nodeValue property to set the value of a node.
* Both of these techniques will result in lost data for strings with ampersands, unless the string are encoded manually before being used.
 [2005-01-21 01:42 UTC] rrichards@php.net
you are correct: "You must create a text node, set the textContent, then append the text node to the new element". specs for dom are that nodeValue is null for elements (doesnt really exist). For an element it accepts entities (so a lone ampersand wont work) Behvaior is not going to change.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Apr 25 07:01:31 2024 UTC