php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #66084 simplexml_load_string() mangles empty node name
Submitted: 2013-11-12 17:31 UTC Modified: 2015-10-07 14:37 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:0 (0.0%)
From: php at kenman dot net Assigned: ab (profile)
Status: Closed Package: SimpleXML related
PHP Version: 5.5Git-2013-11-12 (snap) OS: Win7
Private report: No CVE-ID: None
 [2013-11-12 17:31 UTC] php at kenman dot net
Description:
------------
Since PHP 5.4.16 (at least), simplexml_load_string() incorrectly interprets some XML strings. For specific inputs with empty nodes, node names are converted to integers.

Works as expected in 5.3.10 (libxml 2.7.7); does not work as expected in 5.4.16 (libxml 2.7.8) or 5.5.6-dev (libxml 2.9.1). Seems to be related to libxml2's xmlReadMemory(), though don't quote me on that.

Further observations:

* Removing the <b/> node results in the expected output, though otherwise manipulating <b/> (adding text content, including a closing tag, etc.) has no effect.
* Changing the problematic node to non-selfclosing has no effect.
* Adding a value to <x>, such as <x>1</x>, results in the expected output.
* Adding an attribute to <x> has no effect.
* Including the XML prologue has no effect.
* libxml_use_internal_errors(true) does not report any errors.
* The libxml option LIBXML_NOEMPTYTAG seems to have no effect.

Test script:
---------------
var_dump(simplexml_load_string('<a><b/><c><x/></c></a>')->c);

Expected result:
----------------
object(SimpleXMLElement)#2 (1) {
  ["x"]=>
  object(SimpleXMLElement)#3 (0) {
  }
}

Actual result:
--------------
object(SimpleXMLElement)#2 (1) {
  [0]=>
  object(SimpleXMLElement)#3 (1) {
    [0]=>
    object(SimpleXMLElement)#4 (0) {
    }
  }
}

Patches

sxe-var-dump (last revision 2015-05-26 14:27 UTC by cmb@php.net)

Pull Requests

Pull requests:

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2013-11-12 19:31 UTC] php at kenman dot net
Curiously, the element is actually present at the expected string key, even though it's not represented in the raw var_dump():

var_dump(isset(simplexml_load_string('<a><b/><c><x/></c></a>')->c->x));

That will return TRUE, even though var_dump() exposes no such element (nor does get_object_vars()).
 [2013-11-14 17:23 UTC] php at kenman dot net
This just keeps getting stranger; apparently, whitespace influences this behavior.

Both of these lines produce the expected output:

    var_dump(simplexml_load_string('<a><b/><c> <x/></c></a>')->c);
    var_dump(simplexml_load_string('<a><b/><c><x/> </c></a>')->c);

However, whitespace added at any other point in the string will still produce the unexpected output.
 [2013-11-14 21:57 UTC] php at kenman dot net
Just a note for anyone else that might come across this problem...

A little background: I create the XML originally as a DOMDocument, and then I call saveXML() to generate the XML string. Then, I load that XML with simplexml_load_string() and eventually convert it to JSON (don't ask about the convoluted process, not my choice).

I was able to work-around the issue by setting DOMDocument->formatOutput = true -- as per my last note, the whitespace seems to hide the issue, and so now when I pass it to simplexml_load_string() I get the expected output.
 [2014-04-02 15:55 UTC] ab@php.net
Ref bug #52463, same var_dump() issue.
 [2014-04-03 17:28 UTC] php at kenman dot net
I respectfully disagree, this is not merely a var_dump() issue. It may be true that there are var_dump() anomalies here, but that's not the root of the issue.

The bug was exposed for me by missing data in the final output (JSON). I traced the data stream back and discovered what's reported in this ticket.

Here's a reproduction that doesn't use var_dump():

echo json_encode(simplexml_load_string('<a><b/><c><x/></c></a>')), PHP_EOL;
echo json_encode(simplexml_load_string('<a><b/><c> <x/></c></a>'));
 [2014-04-04 15:19 UTC] ab@php.net
-Status: Open +Status: Verified -Assigned To: +Assigned To: ab
 [2014-04-04 15:19 UTC] ab@php.net
@php at kenman dot net thanks for figuring the json thing out. What I meant previously - it actually doesn't matter which function is used because often it will lead back to the get_properties callback of the corresponding object. So this is not a var_dump issue at all :) If it's only to see with var_dump, it's not of that prio as if that affects something like json.

Thanks
 [2014-04-05 07:51 UTC] ab@php.net
Automatic comment on behalf of ab
Revision: http://git.php.net/?p=php-src.git;a=commit;h=a0beddf5e9ab3c6feaf0921be72a7f430597abea
Log: Fixed bug #66084 simplexml_load_string() mangles empty node name
 [2014-04-05 07:51 UTC] ab@php.net
-Status: Verified +Status: Closed
 [2014-04-05 08:34 UTC] ab@php.net
Automatic comment on behalf of ab
Revision: http://git.php.net/?p=php-src.git;a=commit;h=a0beddf5e9ab3c6feaf0921be72a7f430597abea
Log: Fixed bug #66084 simplexml_load_string() mangles empty node name
 [2014-04-05 08:35 UTC] ab@php.net
Automatic comment on behalf of ab
Revision: http://git.php.net/?p=php-src.git;a=commit;h=a0beddf5e9ab3c6feaf0921be72a7f430597abea
Log: Fixed bug #66084 simplexml_load_string() mangles empty node name
 [2014-04-10 04:47 UTC] tyrael@php.net
Automatic comment on behalf of ab
Revision: http://git.php.net/?p=php-src.git;a=commit;h=a0beddf5e9ab3c6feaf0921be72a7f430597abea
Log: Fixed bug #66084 simplexml_load_string() mangles empty node name
 [2014-08-19 14:46 UTC] evgeniy dot arsenyev at gmail dot com
The issue is still actual for a specific xml:
print_r(simplexml_load_string('<a><c><inner/></c><c><inner>ddd</inner></c></a>'));

PHP 5.5.15 (CentOS 7) returns:
SimpleXMLElement Object
(
    [c] => Array
        (
            [0] => SimpleXMLElement Object
                (
                    [0] => SimpleXMLElement Object
                        (
                        )

                )

            [1] => SimpleXMLElement Object
                (
                    [inner] => ddd
                )

        )

)


but it should return (PHP 5.3.10, Ubuntu 12.10):
SimpleXMLElement Object
(
    [c] => Array
        (
            [0] => SimpleXMLElement Object
                (
                    [inner] => SimpleXMLElement Object
                        (
                        )

                )

            [1] => SimpleXMLElement Object
                (
                    [inner] => dd
                )

        )

)
 [2014-10-07 23:15 UTC] stas@php.net
Automatic comment on behalf of ab
Revision: http://git.php.net/?p=php-src-security.git;a=commit;h=a0beddf5e9ab3c6feaf0921be72a7f430597abea
Log: Fixed bug #66084 simplexml_load_string() mangles empty node name
 [2014-10-07 23:26 UTC] stas@php.net
Automatic comment on behalf of ab
Revision: http://git.php.net/?p=php-src-security.git;a=commit;h=a0beddf5e9ab3c6feaf0921be72a7f430597abea
Log: Fixed bug #66084 simplexml_load_string() mangles empty node name
 [2015-05-26 14:26 UTC] cmb@php.net
-Status: Closed +Status: Re-Opened
 [2015-05-26 14:26 UTC] cmb@php.net
Re-opened because of evgeniy's comment. This bug seems to be
closely related to bug #62639.

Adding the check for !node->next in the commit apparently only
solves some issues. It seems to me that instead it has to be
checked for sxe->iter.type, because when this is SXE_ITER_NONE it
doesn't make sense to execute the else clause.

With the attached patch sxe-var-dump.patch all ext/simplexml/tests
run fine, as well as the test script of bug #69491.

Anantol, can you please have a look at the issue. If useful, I can
make a pull request with some further tests regarding whitespace
issues.
 [2015-05-26 14:27 UTC] cmb@php.net
The following patch has been added/updated:

Patch Name: sxe-var-dump
Revision:   1432650439
URL:        https://bugs.php.net/patch-display.php?bug=66084&patch=sxe-var-dump&revision=1432650439
 [2015-05-26 15:59 UTC] ab@php.net
Hi Christoph,

thanks for following on this. I don't have all the details in my head, however you go into the same direction. Whether the check you've found works better only the tests can decide, i'd say. If it is - so lets take it.

One note though. The trickiness of this ticket is - we deal not only with PHP itself actually, but with the behavior change in libxml and the previous fix might have been not 100% sufficient. 

IMHO what should be done - we should ensure that this fix doesn't break things when PHP is linked with 2.7.x to 2.9.x. Probably the easiest done on Linux when fetching particular libxml version, installing into some prefix and compiling PHP with it.

Are you in the mood to do this? If it's ok under say PHP-5.5 and libxml 2.7.7 2.7.8 2.8.0 2.9.1 2.9.2 - then please create a PR with all the aggregated tests from the tickets you've found related and ping me so i'll test it as well. Does it sound eligible to you?

Thanks.
 [2015-05-26 19:06 UTC] cmb@php.net
Thanks, Anatol, for the quick reply. I've already committed tests
for four closely related bug reports (actually, they're duplicates)
to my sxe-var-dump branch[1]. However, I've noticed that bug #68946
and bug #67572, which I deemed related to this one, won't be fixed
with the given patch. I'll investigate.

With regard to older libxml versions: it seems very reasonable to
test with these. I'll see if I can build the mentionend versions
of libxml2 on Windows (some preliminary tests were successful).

[1] <https://github.com/cmb69/php-src/tree/sxe-var-dump>
 [2015-05-26 20:40 UTC] ab@php.net
Christoph,

building older libxml on Windows doesn't make sense, so you can spare the work. libxml 2.9.2 is used now from 5.5 to 7 and people usually use our dep builds, thus the firm versions we've tested. Thus testing 2.9.2 which is provided on Windows is enough.

But the change will most likely affect Linux as the variety of libxml versions is used in the distributions while the code in PHP is the same (fe see the description of this report) and is also shared across PHP versions. On Linux it's done using say ./configure=/install/to/libxml-x.y.z && make install and then pointing PHP to there --with-libxml-dir.

Thanks
 [2015-05-27 11:08 UTC] cmb@php.net
Ah, I see, Anatol. I'll do the testing under Linux; I have to set
up a build environment (VM) first, so it'll take a while.

FWIW: I've already committed more changes to the mentioned branch;
tested with 5.5.25, 5.6.9 and current master on Windows.
 [2015-05-27 11:41 UTC] ab@php.net
ups, i meant ./configure --prefix=/install/to/libxml-x.y.z ... was too late at night :)

Oki, please ping back when you think i should do the verify builds.

Thanks.
 [2015-05-27 18:24 UTC] cmb@php.net
I've submitted the PR: <https://github.com/php/php-src/pull/1306>. Please have a look at it, Anatol.
 [2015-05-27 21:38 UTC] ab@php.net
Hi Christoph,

the PR looks fine, I'll merge it within days. How would you suggest to backport for 5.5 and 5.6? I guess reverting 18f9589 were enough? or maybe you could create a separate PR for 5.5?

Thanks.
 [2015-05-28 09:57 UTC] cmb@php.net
Hi Anatol,

would cherry-picking the relevant commtis be an option? Or maybe
just merging all commits and resolving the conflict of 18f9589 by
rejecting? I can also make a separate PR for PHP 5, but then there
is most likely no direct "link" between the commits, to easily
follow later. Another option might be to rework the PR, so that
there are only two commits (the first one for all versions, the
second one for master only).

Please let me know what you prefer.
 [2015-05-29 19:26 UTC] ab@php.net
-Status: Re-Opened +Status: Closed
 [2015-05-29 19:26 UTC] ab@php.net
Hi Christoph,

I've merged this into master. git am went not good, but patch worked while backporting to 5.5. Please do some check on that.

Maybe you were up to check those test fails that were persisting till now? If so, please just use another PR ;)

Thanks.
 [2015-05-30 12:29 UTC] cmb@php.net
Thanks for merging, Anatol. The backports are fine (manual code
review + running the simplexml test suite).

I'll have a closer look at the non-passing tests.
 [2015-10-06 19:51 UTC] bjcardon at gmail dot com
When was this bug "fixed" then in backports? It does not seem to be
fixed in 5.6.11...

php > var_dump(simplexml_load_string('<one><two><three>  
</three></two></one>'));
object(SimpleXMLElement)#1 (1) {
  ["two"]=>
  object(SimpleXMLElement)#2 (1) {
    ["three"]=>
    object(SimpleXMLElement)#3 (1) {
      [0]=>
      string(3) "   "
    }
  }
}
php >
var_dump(simplexml_load_string('<one><two><three>somevalue</three></two></one>'));
  
object(SimpleXMLElement)#1 (1) {
  ["two"]=>
  object(SimpleXMLElement)#2 (1) {
    ["three"]=>
    string(9) "somevalue"
  }
}
php > var_dump(new
SimpleXmlElement('<one><two><three>somevalue</three></two></one>'));
object(SimpleXMLElement)#1 (1) {
  ["two"]=>
  object(SimpleXMLElement)#2 (1) {
    ["three"]=>
    string(9) "somevalue"
  }
}
php > var_dump(new SimpleXmlElement('<one><two><three>  
</three></two></one>'));
object(SimpleXMLElement)#1 (1) {
  ["two"]=>
  object(SimpleXMLElement)#2 (1) {
    ["three"]=>
    object(SimpleXMLElement)#3 (1) {
      [0]=>
      string(3) "   "
    }
  }
}
 [2015-10-07 14:37 UTC] ab@php.net
@bjcardon, but your snippets show that it is fixed. The initial report, as well as why the ticket was reopened, is that numeric indexes was used instead of the actual tag names. This is fixed. If you think there's another issue, please open another ticket.

Thanks.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 08:01:29 2024 UTC