|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
[2013-11-12 17:31 UTC] php at kenman dot net
Description:
------------
Since PHP 5.4.16 (at least), simplexml_load_string() incorrectly interprets some XML strings. For specific inputs with empty nodes, node names are converted to integers.
Works as expected in 5.3.10 (libxml 2.7.7); does not work as expected in 5.4.16 (libxml 2.7.8) or 5.5.6-dev (libxml 2.9.1). Seems to be related to libxml2's xmlReadMemory(), though don't quote me on that.
Further observations:
* Removing the <b/> node results in the expected output, though otherwise manipulating <b/> (adding text content, including a closing tag, etc.) has no effect.
* Changing the problematic node to non-selfclosing has no effect.
* Adding a value to <x>, such as <x>1</x>, results in the expected output.
* Adding an attribute to <x> has no effect.
* Including the XML prologue has no effect.
* libxml_use_internal_errors(true) does not report any errors.
* The libxml option LIBXML_NOEMPTYTAG seems to have no effect.
Test script:
---------------
var_dump(simplexml_load_string('<a><b/><c><x/></c></a>')->c);
Expected result:
----------------
object(SimpleXMLElement)#2 (1) {
["x"]=>
object(SimpleXMLElement)#3 (0) {
}
}
Actual result:
--------------
object(SimpleXMLElement)#2 (1) {
[0]=>
object(SimpleXMLElement)#3 (1) {
[0]=>
object(SimpleXMLElement)#4 (0) {
}
}
}
Patchessxe-var-dump (last revision 2015-05-26 14:27 UTC by cmb@php.net)Pull Requests
Pull requests:
HistoryAllCommentsChangesGit/SVN commits
|
|||||||||||||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Thu Oct 30 22:00:01 2025 UTC |
Curiously, the element is actually present at the expected string key, even though it's not represented in the raw var_dump(): var_dump(isset(simplexml_load_string('<a><b/><c><x/></c></a>')->c->x)); That will return TRUE, even though var_dump() exposes no such element (nor does get_object_vars()).This just keeps getting stranger; apparently, whitespace influences this behavior. Both of these lines produce the expected output: var_dump(simplexml_load_string('<a><b/><c> <x/></c></a>')->c); var_dump(simplexml_load_string('<a><b/><c><x/> </c></a>')->c); However, whitespace added at any other point in the string will still produce the unexpected output.I respectfully disagree, this is not merely a var_dump() issue. It may be true that there are var_dump() anomalies here, but that's not the root of the issue. The bug was exposed for me by missing data in the final output (JSON). I traced the data stream back and discovered what's reported in this ticket. Here's a reproduction that doesn't use var_dump(): echo json_encode(simplexml_load_string('<a><b/><c><x/></c></a>')), PHP_EOL; echo json_encode(simplexml_load_string('<a><b/><c> <x/></c></a>'));The issue is still actual for a specific xml: print_r(simplexml_load_string('<a><c><inner/></c><c><inner>ddd</inner></c></a>')); PHP 5.5.15 (CentOS 7) returns: SimpleXMLElement Object ( [c] => Array ( [0] => SimpleXMLElement Object ( [0] => SimpleXMLElement Object ( ) ) [1] => SimpleXMLElement Object ( [inner] => ddd ) ) ) but it should return (PHP 5.3.10, Ubuntu 12.10): SimpleXMLElement Object ( [c] => Array ( [0] => SimpleXMLElement Object ( [inner] => SimpleXMLElement Object ( ) ) [1] => SimpleXMLElement Object ( [inner] => dd ) ) )When was this bug "fixed" then in backports? It does not seem to be fixed in 5.6.11... php > var_dump(simplexml_load_string('<one><two><three> </three></two></one>')); object(SimpleXMLElement)#1 (1) { ["two"]=> object(SimpleXMLElement)#2 (1) { ["three"]=> object(SimpleXMLElement)#3 (1) { [0]=> string(3) " " } } } php > var_dump(simplexml_load_string('<one><two><three>somevalue</three></two></one>')); object(SimpleXMLElement)#1 (1) { ["two"]=> object(SimpleXMLElement)#2 (1) { ["three"]=> string(9) "somevalue" } } php > var_dump(new SimpleXmlElement('<one><two><three>somevalue</three></two></one>')); object(SimpleXMLElement)#1 (1) { ["two"]=> object(SimpleXMLElement)#2 (1) { ["three"]=> string(9) "somevalue" } } php > var_dump(new SimpleXmlElement('<one><two><three> </three></two></one>')); object(SimpleXMLElement)#1 (1) { ["two"]=> object(SimpleXMLElement)#2 (1) { ["three"]=> object(SimpleXMLElement)#3 (1) { [0]=> string(3) " " } } }