php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #46307 xml_parse / xml_set_character_data_handler doesn't resolve simple entities
Submitted: 2008-10-15 19:52 UTC Modified: 2008-10-16 18:07 UTC
From: phpbugs at lange dot nom dot fr Assigned:
Status: Not a bug Package: XML related
PHP Version: 5.2.6 OS: GNU/Linux, Fedora Release 9
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: phpbugs at lange dot nom dot fr
New email:
PHP Version: OS:

 

 [2008-10-15 19:52 UTC] phpbugs at lange dot nom dot fr
Description:
------------
It may be a misunderstanding of mine, but xml_parse doesn't seem to resolve 'simple' entities like < > or &. According to the doc, and one other bug report, I think it should.
While other parsers (XMLReader, SimpleXML, DOM, ...) do.

In numerical form, it does work (>).

I tested with libxml2 versions 2.7.1 and 2.7.2, and expat 2.0.1.

I found this when doing SOAP calls, I couldn't pass the value "<" to a PHP SOAP server (which is using NuSOAP).

I don't have a workaround for the moment.

Reproduce code:
---------------
function character_data($parser, $data){
        print("character_data: $data\n");
}

function parsestring( $strg ) {
        $parser = xml_parser_create("UTF-8");
        xml_set_character_data_handler($parser,'character_data');
        xml_parse($parser,$strg,true);
        xml_parser_free($parser);

        $reader = new XMLReader();
        $reader->XML( $strg );
        while( $reader->read() ) {
                if( $reader->value ) print( $reader->name . ": " . $reader->value . "\n");
        }
}

$xmlcdata = "<element><![CDATA[A>B&C<D]]></element>";
$xmlescaped = "<element>A&gt;B&amp;C&lt;D</element>";
$xmlnumeric = "<element>A&#x3e;B&#x26;C&#x3c;D</element>";

print("parsing $xmlcdata \n");
parsestring( $xmlcdata );
print("\nparsing $xmlescaped \n");
parsestring( $xmlescaped );
print("\nparsing $xmlnumeric \n");
parsestring( $xmlnumeric );


Expected result:
----------------
parsing <element><![CDATA[A>B&C<D]]></element>
character_data: A>B&C<D
#cdata-section: A>B&C<D

parsing <element>A&gt;B&amp;C&lt;D</element>
character_data: A
character_data: >
character_data: B
character_data: &
character_data: C
character_data: <
character_data: D
#text: A>B&C<D

parsing <element>A&#x3e;B&#x26;C&#x3c;D</element>
character_data: A
character_data: >
character_data: B
character_data: &
character_data: C
character_data: <
character_data: D
#text: A>B&C<D


Actual result:
--------------
parsing <element><![CDATA[A>B&C<D]]></element>
character_data: A>B&C<D
#cdata-section: A>B&C<D

parsing <element>A&gt;B&amp;C&lt;D</element>
character_data: A
character_data: B
character_data: C
character_data: D
#text: A>B&C<D

parsing <element>A&#x3e;B&#x26;C&#x3c;D</element>
character_data: A
character_data: >
character_data: B
character_data: &
character_data: C
character_data: <
character_data: D
#text: A>B&C<D


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2008-10-16 18:07 UTC] rrichards@php.net
Please do not submit the same bug more than once. An existing
bug report already describes this very problem. Even if you feel
that your issue is somewhat different, the resolution is likely
to be the same. 

Thank you for your interest in PHP.

Duplicate of bug #45996
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Apr 23 23:01:29 2024 UTC