php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #54735 xml_parse_into_struct gives uncomplete array
Submitted: 2011-05-14 13:29 UTC Modified: 2015-05-14 17:10 UTC
Votes:3
Avg. Score:4.3 ± 0.9
Reproduced:2 of 2 (100.0%)
Same Version:2 (100.0%)
Same OS:0 (0.0%)
From: phpnet at phoen dot de Assigned:
Status: Analyzed Package: *XML functions
PHP Version: 5.3.6 OS: Windows XPSP2
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: phpnet at phoen dot de
New email:
PHP Version: OS:

 

 [2011-05-14 13:29 UTC] phpnet at phoen dot de
Description:
------------
hi,

configuration: Windows XP with XAMPP 1.7.3 (nothing special)

xml_parse_into_struct returns an uncomplete array if there is an unexpected sign(maybe utf8).
All after "€" in the array is missing. No error or messed up character shows up.
It doesn't help to set any options.

$myxml=utf8_encode($myxml);

and afterwards

can fix the problem surprisingly.

thanks,H.







Test script:
---------------
<?php
$myxml="<XML><FELD1>blubb</FELD1><FELD2>dies hier sonderzeichen €</FELD2><FELD3>feld3</FELD3></XML>";
  
$p = xml_parser_create();
xml_parse_into_struct($p, $myxml, $vals, $index);
xml_parser_free($p);

print_r($vals); 
php?>

Expected result:
----------------
sth like this

Array ( [0] =>
Array ( [tag] => XML [type] => open [level] => 1 [value] => ) [1] => Array ( [tag] => FELD1 [type] => complete [level] => 2 [value] => blubb ) [2] => Array ( [tag] => XML [value] => [type] => cdata [level] => 1 ) [3] => Array ( [tag] => FELD2 [type] => complete [level] => 2 [value] => dies hier sonderzeichen €) [3] => Array ( [tag] => XML [value] => [type] => cdata [level] => 1 ) [3] => Array ( [tag] => FELD3 [type] => complete[level] => 2 [value] => feld3))

Actual result:
--------------
Array ( [0] =>
Array ( [tag] => XML [type] => open [level] => 1 [value] => ) [1] => Array ( [tag] => FELD1 [type] => complete [level] => 2 [value] => blubb ) [2] => Array ( [tag] => XML [value] => [type] => cdata [level] => 1 ) [3] => Array ( [tag] => FELD2 [type] => open [level] => 2 [value] => dies hier sonderzeichen ) ) 

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2012-04-26 20:32 UTC] peter dot e dot lind at gmail dot com
Doesn't seem to be a bug on Linux (PHP 5.3.3-7+squeeze8 with Suhosin-Patch 
(cli)) - when I save the test-script in utf-8 the output is as expected. 
However, if I save it as ISO-8859-15, the xml parsing stops, as one would 
expect, when it hits the illegal character - an ISO-8859-15 € would be illegal 
in utf-8 and xml parsers must stop when they reach illegal content.

The docs could be improved though, as http://dk.php.net/manual/en/function.xml-
parser-create.php specifies that "The supported encodings are ISO-8859-1, UTF-8 
and US-ASCII." but in the context it appears to only be the case for output. 
However, the behaviour suggests those three character sets also restrict input. 
Some comments in the php code also suggest that this is the case.
 [2015-05-14 15:24 UTC] cmb@php.net
-Type: Bug +Type: Documentation Problem -Assigned To: +Assigned To: cmb
 [2015-05-14 15:24 UTC] cmb@php.net
Indeed, Peter, this is not a bug in the XML parser, but rather
invalid characters have been passed in. That could have been
detected by calling xml_error_string().

I agree that the docs need improvement, even though I'm not sure
yet what is actually supported by the XML parser.
 [2015-05-14 17:10 UTC] cmb@php.net
-Status: Assigned +Status: Analyzed -Assigned To: cmb +Assigned To:
 [2015-05-14 17:10 UTC] cmb@php.net
Setting the *input* encoding to 'ISO-8859-1' in
xml_parser_create() is already ignored as of PHP 5.0.0[3], see
<http://3v4l.org/eb2k1>. The input encoding is solely specified by
the encoding attribute of the document's XML declaration (which
defaults to UTF-8 if omitted), see <http://3v4l.org/tUDnV>. The
$encoding parameter of xml_parser_create() specifies only the
output encoding.

The behavior when expat is used as back-end might be different.
 
PHP Copyright © 2001-2018 The PHP Group
All rights reserved.
Last updated: Tue Nov 13 16:01:26 2018 UTC