php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #36795 Inappropriate "unterminated entity reference" in DOMElement->setAttribute
Submitted: 2006-03-20 02:05 UTC Modified: 2010-04-09 14:01 UTC
Votes:64
Avg. Score:4.5 ± 0.7
Reproduced:60 of 60 (100.0%)
Same Version:29 (48.3%)
Same OS:49 (81.7%)
From: john at carney dot id dot au Assigned:
Status: Not a bug Package: DOM XML related
PHP Version: 5.*, 6 OS: *
Private report: No CVE-ID:
 [2006-03-20 02:05 UTC] john at carney dot id dot au
Description:
------------
While it is not specifically mentioned in the documentation, DOMElement->setAttribute automatically escapes XML special characters in the value parameter. Yet, as of PHP 5.1.2 it will throw an "unterminated entity reference" warning if the supplied value contains an ampersand - even if it is escaped.

As well as fixing the actual bug, the documentation needs to clarify *exactly* how special characters in the inputs to this and other DOM functions are treated. If you are going to silently escape input text, you need to tell people so that they don't end up with stuff being double-escaped.

Reproduce code:
---------------
$element->setAttribute ("anattr", "jack & jill") ;
$element->setAttribute ("anattr", "jack & jill") ;

Expected result:
----------------
No warnings should be thrown.

Actual result:
--------------
BOTH calls to setAttribute throw an "unterminated entity reference" warning.

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2006-04-01 04:49 UTC] tamit at xmission dot com
This is most definitely a bug.  I've replicated by producing the following tree in my code:

(This is well-formed XML so I have no idea why there would be a problem.)

---------BEGIN XML----------------------------
<?xml version="1.0" encoding="iso-8859-1"?>
<classes>
	<class classid="0" parentid="" class_level="0">Root<class classid="1" parentid="0" class_level="1">Adhesives </class>
		<class classid="3286" parentid="0" class_level="1">Agricultural and Farming Products</class>
		<class classid="3283" parentid="0" class_level="1">Architectural and Civil Engineering Products</class>
		<class classid="14" parentid="0" class_level="1">Automatic ID</class>
		<class classid="45" parentid="0" class_level="1">Chemical Processing </class>
		<class classid="124" parentid="0" class_level="1">Cleaning Products </class>
		<class classid="148" parentid="0" class_level="1">Communication Systems </class>
		<class classid="264" parentid="0" class_level="1">Computer Hardware </class>
		<class classid="3281" parentid="0" class_level="1">Construction Equipment and Supplies</class>
		<class classid="489" parentid="0" class_level="1">Controls </class>
		<class classid="589" parentid="0" class_level="1">Display </class>
		<class classid="612" parentid="0" class_level="1">Electrical Equipment </class>
		<class classid="772" parentid="0" class_level="1">Electronic Components </class>
		<class classid="3282" parentid="0" class_level="1">Explosives,  Armaments, and Weaponry</class>
		<class classid="920" parentid="0" class_level="1">Fasteners </class>
		<class classid="954" parentid="0" class_level="1">Fluid </class>
		<class classid="3461" parentid="0" class_level="1">Food Processing </class>
		<class classid="3288" parentid="0" class_level="1">Health, Medical, </class>
		<class classid="1029" parentid="0" class_level="1">HVAC</class>
		<class classid="1068" parentid="0" class_level="1">Labels Tags Signage </class>
		<class classid="3279" parentid="0" class_level="1">Laboratory and Research Supplies and Equipment</class>
		<class classid="1083" parentid="0" class_level="1">Lubricants</class>
		<class classid="1106" parentid="0" class_level="1">Machinery </class>
		<class classid="1424" parentid="0" class_level="1">Material Handling </class>
		<class classid="1303" parentid="0" class_level="1">Materials </class>
		<class classid="3284" parentid="0" class_level="1">Mechanical Components and Assemblies</class>
		<class classid="1620" parentid="0" class_level="1">Mechanical Power Transmission</class>
		<class classid="3462" parentid="0" class_level="1">Mining, Oil Drilling </class>
		<class classid="1728" parentid="0" class_level="1">Mounting </class>
		<class classid="3285" parentid="0" class_level="1">Non-Industrial Products</class>
		<class classid="1782" parentid="0" class_level="1">Optics </class>
		<class classid="2054" parentid="0" class_level="1">Packaging Equipment </class>
		<class classid="2151" parentid="0" class_level="1">Paints </class>
		<class classid="2185" parentid="0" class_level="1">Plant Furnishings </class>
		<class classid="2196" parentid="0" class_level="1">Portable Tools</class>
		<class classid="2286" parentid="0" class_level="1">Printing </class>
		<class classid="3539" parentid="0" class_level="1">Problematic Headings</class>
		<class classid="3463" parentid="0" class_level="1">Retail and Sales Equipment</class>
		<class classid="2328" parentid="0" class_level="1">Robotics</class>
		<class classid="2369" parentid="0" class_level="1">Safety </class>
		<class classid="2399" parentid="0" class_level="1">Sensors Monitors </class>
		<class classid="3280" parentid="0" class_level="1">Services</class>
		<class classid="2585" parentid="0" class_level="1">Software</class>
		<class classid="2697" parentid="0" class_level="1">Test </class>
		<class classid="3919" parentid="0" class_level="1">Textile Industry Products</class>
		<class classid="3167" parentid="0" class_level="1">Thermal </class>
		<class classid="3190" parentid="0" class_level="1">Timers </class>
		<class classid="3287" parentid="0" class_level="1">Transportation Industry Products</class>
		<class classid="3193" parentid="0" class_level="1">Vision Systems</class>
		<class classid="3208" parentid="0" class_level="1">Waste Handling Equipment</class>
		<class classid="3246" parentid="0" class_level="1">Welding Equipment </class>
	</class>
</classes>
--------------END XML---------------------------
 [2006-12-06 11:49 UTC] philippe dot levan_nospam at kitpages dot fr
Hi,

I have the same problem. My config is :
- PHP 5.2
- libxml Version 2.6.16
---------
<?php
$xmlStr = "<?xml version='1.0' encoding='UTF-8'?><root></root>";
$xml = new SimpleXMLElement($xmlStr);
$xml->addChild("foo",utf8_encode("start < > end"));
echo "foo tag added ok";
$xml->addChild("bar",utf8_encode("start & end"));
echo "error on bar tag because of &amp;";
$result = $xml->asXML();
echo "<pre>".htmlentities($result)."</pre>";
?>
-------------
you can run this script at : http://www.kitpages.fr/test/bugSimpleXml.php
 [2007-11-27 14:03 UTC] oscar at cdcovers dot to
I tried the workaround below and it seems to work:

$xml->addChild('element', '');
$xml->element = str_replace("&", "&amp;", "value of the element");
 [2008-02-08 20:09 UTC] moshe at varien dot com
PHP 5.2.4
Looks like the problem appears when there's node already exists being overwritten

// works ok, doesn't require encoding:
$a = simplexml_load_string('<a/>'); 
$a->b = "& < ' ";

// doesn't work, requires encoding:
$a = simplexml_load_string('<a><b>test</b></a>'); 
$a->b = "& < ' "; 

// doesn't work, always requires encoding
$a->addChild('b', "& < '");
$a->addAttribute('b', "& < '");

// works ok, never requires encoding
$a['b'] = "& < '";
 [2008-04-03 23:15 UTC] rob at electronicinsight dot com
A little hack to get around this bug:

function &safe_add_child(&$sxml, $name, $value) {
  $safe_value = preg_replace('/&(?!\w+;)/', '&amp;', $value);
  return $sxml->addChild($name, $safe_value);
}
 [2009-10-22 16:28 UTC] gary dot malcolm at gmail dot com
I'm running PHP 5.2.9 on Linux and this bug is still alive and well making SimpleXml absolutely inappropriate for XML communications between systems.
<code>
$safe_value = preg_replace('/&(?!\w+;)/', '&amp;', $value);
  return $sxml->addChild($name, $safe_value);
</code>
Is just plain wrong. I'm communicating user input directly to a bank as I can't know how the third party will parse their xml.
 [2010-02-04 18:23 UTC] jalday at delivery dot com
Still seeing this issue... 

$order_x->addChild('location', '1st & 52nd');

gives "Warning: SimpleXMLElement::addChild(): unterminated entity reference"

If I run it as

$order_x->addChild('location', htmlspecialchars('1st & 52nd'));

I have no problems.
 [2010-04-09 14:01 UTC] rrichards@php.net
-Status: Open +Status: Bogus
 [2010-04-09 14:01 UTC] rrichards@php.net
Behavior as defined by DOM specs. No warnings are issued are from either of the 2 
examples in the reproduced code.

addChild() method described in later reports works are defined by specs. Use the 
simplexml property accessors for auto escaping.
 [2010-09-22 01:02 UTC] steven at navolutions dot com
I also had this issue, one thing that might not have been included in the original reproducing of the code is that the DOMElement may have been extended. I know mine is extended so Reproduce the code by extending the DOMElement class. I also extended the DOMDocuement class so try that too. So no the status is not Bogus, just to tested thoroughly.
 [2011-02-23 03:30 UTC] jan-bugreport at gmx dot de
With simpleXML, addChild($name, $value) works really weird (tested on 5.3.1 on win): in the value, the characters < and > are correctly esacped to &lt; and &gt; but ampersands cause the "unterminated entity reference" message. I would understand if it escaped nothing, or if it escaped everything, but this seems weird.

Also, no matter what the final decision about this bug will be, this should be documented really well in the SimpleXML docs. It is confusing and I could imagine it could cause security issues in some applications.
 [2011-09-11 01:40 UTC] abxccd at msn dot com
I am still seeing this bug in PHP 5.3.8
 [2011-10-08 18:33 UTC] matteosistisette at gmail dot com
I'm still observing this issue (by the way, why is it marked as "bogus"?).

Even the simplexml property accessors does give me the warning, such as:

$a['b'] = "& < '"; // GENERATES THE WARNING!!!!!!!!
 [2013-06-27 18:01 UTC] hanskrentel at yahoo dot de
This is bogus because & as character is needed in the attribute values to start an 
entity to express character references. Otherwise it would not be possible to set 
the superset of all XML attribute values (AttValue; http://www.w3.org/TR/xml/#NT-
AttValue), the expression wouldn't be distinct.

Like you need to write "\t" in a PHP string to express a tab and therefore "\\" to 
express the slash. I hope this clarifies this a bit.
 
PHP Copyright © 2001-2014 The PHP Group
All rights reserved.
Last updated: Wed Apr 23 09:02:23 2014 UTC