php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #33020 getAttribute() incorrectly de-references entities
Submitted: 2005-05-12 21:53 UTC Modified: 2005-05-13 08:53 UTC
From: mark at kimsal dot com Assigned:
Status: Not a bug Package: DOM XML related
PHP Version: 5.0.4 OS: Linux
Private report: No CVE-ID: None
 [2005-05-12 21:53 UTC] mark at kimsal dot com
Description:
------------
When using getAttribute to get the value of an attribute   
that contains an XML entity (eg. &#174) the returned value   
is neither the actual character that is being represented   
nor the characters that make up the entity '&' '#' '1' '7'   
'4'.  Instead there are two characters that are returned, A   
circumflex (Â) and, in this example, the registered  
trademark symbol (®).   
   
'./configure' '--prefix=/opt/php5' '--with-mysql'    
'--with-apxs2=/usr/sbin/apxs2' '--with-zlib' '--with-expat'    
'--enable-sigchild' '--with-openssl' '--enable-shared'    
'--enable-soap' '--with-gd' '--enable-mbstring'    
'--with-pgsql' '--with-imap=shared,/usr/lib'    
'--with-imap-ssl=no' '--with-gd'    
'--with-jpeg-dir=/usr/include' '--with-ldap' '--with-xsl'    
'--enable-memory-limit' '--enable-debug'    
'--with-libxml-dir=/usr'    
   
   

Reproduce code:
---------------
ent.php:
        $dom = new DomDocument();
        $dom->substituteEntities = true;
        $dom->resolveExternals = true;
        $dom->load('ent.xml');
        $dom->substituteEntities = true;
        $dom->resolveExternals = true;

        echo $dom->saveXml();
        $list = $dom->getElementsByTagName('bar');
        $node = $list->item(0);
        echo "\n===\n Attribute 'test' of node 'bar': ";
        echo $node->getAttribute("test");
        echo "\n===\n Attribute 'test' of node 'bar' with htmlentities: ";
        echo htmlentities($node->getAttribute("test"));
        echo "\n===\n";

ent.xml:
<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY reg "&#174;">
]>
<foo>
        <bar test="&reg;">&#174;</bar>
</foo>


Expected result:
----------------
<?xml version="1.0"?> 
<!DOCTYPE foo [ 
<!ENTITY reg "&#174;"> 
]> 
<foo> 
        <bar test="&#174;">&#174;</bar> 
</foo> 
 
=== 
 Attribute 'test' of node 'bar': &#174; 
=== 
 Attribute 'test' of node 'bar' with htmlentities: 
&amp;#174; 
=== 
 

Actual result:
--------------
<?xml version="1.0"?> 
<!DOCTYPE foo [ 
<!ENTITY reg "&#174;"> 
]> 
<foo> 
        <bar test="&#xAE;">&#xAE;</bar> 
</foo> 
 
=== 
 Attribute 'test' of node 'bar': ® 
=== 
 Attribute 'test' of node 'bar' with htmlentities: 
&Acirc;&reg; 
=== 
 

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2005-05-13 08:53 UTC] chregu@php.net
Sorry, but your problem does not imply a bug in PHP itself.  For a
list of more appropriate places to ask for help using PHP, please
visit http://www.php.net/support.php as this bug system is not the
appropriate forum for asking support questions.  Due to the volume
of reports we can not explain in detail here why your report is not
a bug.  The support channels will be able to provide an explanation
for you.

Thank you for your interest in PHP.

It's UTF-8 encoded, but you output is as ISO-8859-1 (same goes for htmlentities).

(Almost) anything from DOM is UTF-8 encoded. 
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu May 02 07:01:30 2024 UTC