php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #50545 entities being dropped, schma validation fails
Submitted: 2009-12-21 17:45 UTC Modified: 2009-12-22 18:05 UTC
From: aclark at wayfm dot com Assigned:
Status: Not a bug Package: DOM XML related
PHP Version: 5.2.12 OS: Gentoo Linux
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: aclark at wayfm dot com
New email:
PHP Version: OS:

 

 [2009-12-21 17:45 UTC] aclark at wayfm dot com
Description:
------------
After a recent PHP upgrade (to 5.2.11-r1), some existing code on a few of my sites suddenly "broke."

In both instances, it's XML-related PHP code that silently and completely drops html entities from XML code.

In one instance, it's an RSS feed. "<content:encoded>&lt;p&gt;Lorem..." becomes "<content:encoded>pLorem..."

The (newly) offending code contains the xml_parse_into_struct function.


In the other, it's a CDATA section of an XML-RPC ping. Same problem. The entity-escaped tags are preserved, but without the surrounding lt and gt entities, rendering the payload useless.

This code uses DOMDocument::LoadXML and schemaValidate

Searching a bit turned up the desiccated carcass of bug #35271, but nothing recent that I could find.

Downgraded to PHP 5.2.9-r2. Same problem

Reproduce code:
---------------
    libxml_use_internal_errors(true);
    $xdoc= new DomDocument;
    $xml=$params[1];
    if (!$xml) {
        xmlrpc_error(10, "No payload detected.");
    }
    
    $xmlschema='payload2.xsd';
    $xdoc->LoadXML($xml);
    
    if ($xdoc->schemaValidate($xmlschema)) {

Expected result:
----------------
$xml (payload from incoming XML-RPC ping) is successfully validated against the schema doc), schemaValidate if statement is true, & code inside is executed.

Actual result:
--------------
Schema validation fails with "The document has no document element." A dump of the payload reveals that lt and gt entities have been stripped from the payload: tag attr="true"tag attr="10046"tag /tagtagTag Contents/tagtag  Tag Contents/tag/tag/tag //tag

schemaValidate if statement is false, else code (omitted) is executed, returning aforementioned error to RPC client.

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-12-22 09:11 UTC] jani@php.net
1. What libxml version is PHP compiled with? (check from phpinfo()..)
2. Your reproducing script has some problems since it's not possible to copy'n'paste it and just run it..
 [2009-12-22 16:10 UTC] aclark at wayfm dot com
Compiled against libxml2-2.7.2-r2.
 [2009-12-22 16:45 UTC] rrichards@php.net
Try upgrading libxml. Versions 2.7.0 - 2.7.2 had broken some entity 
handling and this appears to be that
 [2009-12-22 17:26 UTC] aclark at wayfm dot com
Did a search based on rrichards' comment and found bug #45996.

Upgraded to libxml2-2.7.3-r2, built php-5.2.11-r1 against it, and restarted apache. Both problems have been resolved.

Thanks for your help and sorry to post a non-bug.

Should this be closed as a duplicate of bug #45996?
 [2009-12-22 18:05 UTC] rrichards@php.net
Dupe of bug #45996
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat May 18 01:01:33 2024 UTC