php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #50545 entities being dropped, schma validation fails
Submitted: 2009-12-21 17:45 UTC Modified: 2009-12-22 18:05 UTC
From: aclark at wayfm dot com Assigned:
Status: Not a bug Package: DOM XML related
PHP Version: 5.2.12 OS: Gentoo Linux
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: aclark at wayfm dot com
New email:
PHP Version: OS:

 

 [2009-12-21 17:45 UTC] aclark at wayfm dot com
Description:
------------
After a recent PHP upgrade (to 5.2.11-r1), some existing code on a few of my sites suddenly "broke."

In both instances, it's XML-related PHP code that silently and completely drops html entities from XML code.

In one instance, it's an RSS feed. "<content:encoded>&lt;p&gt;Lorem..." becomes "<content:encoded>pLorem..."

The (newly) offending code contains the xml_parse_into_struct function.


In the other, it's a CDATA section of an XML-RPC ping. Same problem. The entity-escaped tags are preserved, but without the surrounding lt and gt entities, rendering the payload useless.

This code uses DOMDocument::LoadXML and schemaValidate

Searching a bit turned up the desiccated carcass of bug #35271, but nothing recent that I could find.

Downgraded to PHP 5.2.9-r2. Same problem

Reproduce code:
---------------
    libxml_use_internal_errors(true);
    $xdoc= new DomDocument;
    $xml=$params[1];
    if (!$xml) {
        xmlrpc_error(10, "No payload detected.");
    }
    
    $xmlschema='payload2.xsd';
    $xdoc->LoadXML($xml);
    
    if ($xdoc->schemaValidate($xmlschema)) {

Expected result:
----------------
$xml (payload from incoming XML-RPC ping) is successfully validated against the schema doc), schemaValidate if statement is true, & code inside is executed.

Actual result:
--------------
Schema validation fails with "The document has no document element." A dump of the payload reveals that lt and gt entities have been stripped from the payload: tag attr="true"tag attr="10046"tag /tagtagTag Contents/tagtag  Tag Contents/tag/tag/tag //tag

schemaValidate if statement is false, else code (omitted) is executed, returning aforementioned error to RPC client.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-12-22 09:11 UTC] jani@php.net
1. What libxml version is PHP compiled with? (check from phpinfo()..)
2. Your reproducing script has some problems since it's not possible to copy'n'paste it and just run it..
 [2009-12-22 16:10 UTC] aclark at wayfm dot com
Compiled against libxml2-2.7.2-r2.
 [2009-12-22 16:45 UTC] rrichards@php.net
Try upgrading libxml. Versions 2.7.0 - 2.7.2 had broken some entity 
handling and this appears to be that
 [2009-12-22 17:26 UTC] aclark at wayfm dot com
Did a search based on rrichards' comment and found bug #45996.

Upgraded to libxml2-2.7.3-r2, built php-5.2.11-r1 against it, and restarted apache. Both problems have been resolved.

Thanks for your help and sorry to post a non-bug.

Should this be closed as a duplicate of bug #45996?
 [2009-12-22 18:05 UTC] rrichards@php.net
Dupe of bug #45996
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Sat Jul 05 04:01:35 2025 UTC