php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #64222 SimpleXML improperly parses Mixed Content Elements
Submitted: 2013-02-16 01:07 UTC Modified: 2013-02-19 23:40 UTC
From: Anxiety35 at gmail dot com Assigned:
Status: Closed Package: SimpleXML related
PHP Version: 5.4.11 OS: Linux
Private report: No CVE-ID: None
 [2013-02-16 01:07 UTC] Anxiety35 at gmail dot com
Description:
------------
The following is parsed incorrectly by SimpleXML (but is actually somehow valid 
XML):

<?xml version="1.0" standalone="yes"?>
<Response>
    <CustomsID>010912-1
        <IsApproved>NO</IsApproved>
        <ErrorMsg>Electronic refunds...</ErrorMsg>
    </CustomsID>
</Response>

Simple XML results in:
SimpleXMLElement Object ( [CustomsID] => Array ( [0] => 010912-1 [1] => 010912-2 
) )

All child elements of <CustomsID> get dropped.

Test script:
---------------
<?php
$xml = '<?xml version="1.0" standalone="yes"?>
<Response>
    <CustomsID>010912-1
        <IsApproved>NO</IsApproved>
        <ErrorMsg>Electronic refunds...</ErrorMsg>
    </CustomsID>
    <CustomsID>010912-2
        <IsApproved>NO</IsApproved>
        <ErrorMsg>Electronic refunds...</ErrorMsg>
    </CustomsID>
</Response>';

$result = simplexml_load_string($xml);
//$result = new SimpleXMLElement($xml);

echo "Resulting object: ".print_r($result,true);

Expected result:
----------------
// Something Like:
Resulting object: SimpleXMLElement Object
(
    [CustomsID] => Array
        (
            [0] => SimpleXMLElement Object
                (
                    [@attributes] => Array
                        (
                            [CustomsID] => 010912-1
                        )
                    [IsApproved] => NO
                    [ErrorMsg] => Electronic refunds...
                )

            [1] => SimpleXMLElement Object
                (
                    [@attributes] => Array
                        (
                            [CustomsID] => 010912-2
                        )
                    [IsApproved] => NO
                    [ErrorMsg] => Electronic refunds...
                )
        )
)

// OR Like:
Resulting object: SimpleXMLElement Object
(
    [CustomsID] => Array
        (
            [0] => SimpleXMLElement Object
                (
                    [CustomsID] => 010912-1
                    [IsApproved] => NO
                    [ErrorMsg] => Electronic refunds...
                )

            [1] => SimpleXMLElement Object
                (
                    [CustomsID] => 010912-2
                    [IsApproved] => NO
                    [ErrorMsg] => Electronic refunds...
                )
        )
)

Actual result:
--------------
Resulting object: SimpleXMLElement Object
(
    [CustomsID] => Array
        (
            [0] => 010912-1
        
            [1] => 010912-2
        )
)

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2013-02-16 02:18 UTC] Anxiety35 at gmail dot com
As described in the XML Specs section 3.2.2 titled "Mixed Content"
Spec: www.w3.org/TR/REC-xml/

w3.org is down at the time of posting this comment. Hopefully it's back up soon.
 [2013-02-16 02:18 UTC] Anxiety35 at gmail dot com
-Summary: SimpleXML improperly parses Elements containing both text and child elements +Summary: SimpleXML improperly parses Mixed Content Elements
 [2013-02-19 23:40 UTC] Anxiety35 at gmail dot com
-Status: Open +Status: Closed
 [2013-02-19 23:40 UTC] Anxiety35 at gmail dot com
The issue turned out to be that I was deconstructing it with php's 
get_object_vars() function. This works 90% of the time but chokes in this 
instance because of how SimpleXML holds the data. So, it's not the fault of 
SimpleXML and manually traversing the object will do the trick.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Mon Apr 29 19:01:30 2024 UTC