php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #46307 xml_parse / xml_set_character_data_handler doesn't resolve simple entities
Submitted: 2008-10-15 19:52 UTC Modified: 2008-10-16 18:07 UTC
From: phpbugs at lange dot nom dot fr Assigned:
Status: Not a bug Package: XML related
PHP Version: 5.2.6 OS: GNU/Linux, Fedora Release 9
Private report: No CVE-ID: None
 [2008-10-15 19:52 UTC] phpbugs at lange dot nom dot fr
Description:
------------
It may be a misunderstanding of mine, but xml_parse doesn't seem to resolve 'simple' entities like < > or &. According to the doc, and one other bug report, I think it should.
While other parsers (XMLReader, SimpleXML, DOM, ...) do.

In numerical form, it does work (>).

I tested with libxml2 versions 2.7.1 and 2.7.2, and expat 2.0.1.

I found this when doing SOAP calls, I couldn't pass the value "<" to a PHP SOAP server (which is using NuSOAP).

I don't have a workaround for the moment.

Reproduce code:
---------------
function character_data($parser, $data){
        print("character_data: $data\n");
}

function parsestring( $strg ) {
        $parser = xml_parser_create("UTF-8");
        xml_set_character_data_handler($parser,'character_data');
        xml_parse($parser,$strg,true);
        xml_parser_free($parser);

        $reader = new XMLReader();
        $reader->XML( $strg );
        while( $reader->read() ) {
                if( $reader->value ) print( $reader->name . ": " . $reader->value . "\n");
        }
}

$xmlcdata = "<element><![CDATA[A>B&C<D]]></element>";
$xmlescaped = "<element>A&gt;B&amp;C&lt;D</element>";
$xmlnumeric = "<element>A&#x3e;B&#x26;C&#x3c;D</element>";

print("parsing $xmlcdata \n");
parsestring( $xmlcdata );
print("\nparsing $xmlescaped \n");
parsestring( $xmlescaped );
print("\nparsing $xmlnumeric \n");
parsestring( $xmlnumeric );


Expected result:
----------------
parsing <element><![CDATA[A>B&C<D]]></element>
character_data: A>B&C<D
#cdata-section: A>B&C<D

parsing <element>A&gt;B&amp;C&lt;D</element>
character_data: A
character_data: >
character_data: B
character_data: &
character_data: C
character_data: <
character_data: D
#text: A>B&C<D

parsing <element>A&#x3e;B&#x26;C&#x3c;D</element>
character_data: A
character_data: >
character_data: B
character_data: &
character_data: C
character_data: <
character_data: D
#text: A>B&C<D


Actual result:
--------------
parsing <element><![CDATA[A>B&C<D]]></element>
character_data: A>B&C<D
#cdata-section: A>B&C<D

parsing <element>A&gt;B&amp;C&lt;D</element>
character_data: A
character_data: B
character_data: C
character_data: D
#text: A>B&C<D

parsing <element>A&#x3e;B&#x26;C&#x3c;D</element>
character_data: A
character_data: >
character_data: B
character_data: &
character_data: C
character_data: <
character_data: D
#text: A>B&C<D


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2008-10-16 18:07 UTC] rrichards@php.net
Please do not submit the same bug more than once. An existing
bug report already describes this very problem. Even if you feel
that your issue is somewhat different, the resolution is likely
to be the same. 

Thank you for your interest in PHP.

Duplicate of bug #45996
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Mar 29 15:01:28 2024 UTC