php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #33020 getAttribute() incorrectly de-references entities
Submitted: 2005-05-12 21:53 UTC Modified: 2005-05-13 08:53 UTC
From: mark at kimsal dot com Assigned:
Status: Not a bug Package: DOM XML related
PHP Version: 5.0.4 OS: Linux
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: mark at kimsal dot com
New email:
PHP Version: OS:

 

 [2005-05-12 21:53 UTC] mark at kimsal dot com
Description:
------------
When using getAttribute to get the value of an attribute   
that contains an XML entity (eg. &#174) the returned value   
is neither the actual character that is being represented   
nor the characters that make up the entity '&' '#' '1' '7'   
'4'.  Instead there are two characters that are returned, A   
circumflex (Â) and, in this example, the registered  
trademark symbol (®).   
   
'./configure' '--prefix=/opt/php5' '--with-mysql'    
'--with-apxs2=/usr/sbin/apxs2' '--with-zlib' '--with-expat'    
'--enable-sigchild' '--with-openssl' '--enable-shared'    
'--enable-soap' '--with-gd' '--enable-mbstring'    
'--with-pgsql' '--with-imap=shared,/usr/lib'    
'--with-imap-ssl=no' '--with-gd'    
'--with-jpeg-dir=/usr/include' '--with-ldap' '--with-xsl'    
'--enable-memory-limit' '--enable-debug'    
'--with-libxml-dir=/usr'    
   
   

Reproduce code:
---------------
ent.php:
        $dom = new DomDocument();
        $dom->substituteEntities = true;
        $dom->resolveExternals = true;
        $dom->load('ent.xml');
        $dom->substituteEntities = true;
        $dom->resolveExternals = true;

        echo $dom->saveXml();
        $list = $dom->getElementsByTagName('bar');
        $node = $list->item(0);
        echo "\n===\n Attribute 'test' of node 'bar': ";
        echo $node->getAttribute("test");
        echo "\n===\n Attribute 'test' of node 'bar' with htmlentities: ";
        echo htmlentities($node->getAttribute("test"));
        echo "\n===\n";

ent.xml:
<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY reg "&#174;">
]>
<foo>
        <bar test="&reg;">&#174;</bar>
</foo>


Expected result:
----------------
<?xml version="1.0"?> 
<!DOCTYPE foo [ 
<!ENTITY reg "&#174;"> 
]> 
<foo> 
        <bar test="&#174;">&#174;</bar> 
</foo> 
 
=== 
 Attribute 'test' of node 'bar': &#174; 
=== 
 Attribute 'test' of node 'bar' with htmlentities: 
&amp;#174; 
=== 
 

Actual result:
--------------
<?xml version="1.0"?> 
<!DOCTYPE foo [ 
<!ENTITY reg "&#174;"> 
]> 
<foo> 
        <bar test="&#xAE;">&#xAE;</bar> 
</foo> 
 
=== 
 Attribute 'test' of node 'bar': ® 
=== 
 Attribute 'test' of node 'bar' with htmlentities: 
&Acirc;&reg; 
=== 
 

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2005-05-13 08:53 UTC] chregu@php.net
Sorry, but your problem does not imply a bug in PHP itself.  For a
list of more appropriate places to ask for help using PHP, please
visit http://www.php.net/support.php as this bug system is not the
appropriate forum for asking support questions.  Due to the volume
of reports we can not explain in detail here why your report is not
a bug.  The support channels will be able to provide an explanation
for you.

Thank you for your interest in PHP.

It's UTF-8 encoded, but you output is as ISO-8859-1 (same goes for htmlentities).

(Almost) anything from DOM is UTF-8 encoded. 
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu May 02 04:01:30 2024 UTC