php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #54735 xml_parse_into_struct gives uncomplete array
Submitted: 2011-05-14 13:29 UTC Modified: 2015-05-14 17:10 UTC
Votes:3
Avg. Score:4.3 ± 0.9
Reproduced:2 of 2 (100.0%)
Same Version:2 (100.0%)
Same OS:0 (0.0%)
From: phpnet at phoen dot de Assigned:
Status: Analyzed Package: *XML functions
PHP Version: 5.3.6 OS: Windows XPSP2
Private report: No CVE-ID: None
Have you experienced this issue?
Rate the importance of this bug to you:

 [2011-05-14 13:29 UTC] phpnet at phoen dot de
Description:
------------
hi,

configuration: Windows XP with XAMPP 1.7.3 (nothing special)

xml_parse_into_struct returns an uncomplete array if there is an unexpected sign(maybe utf8).
All after "€" in the array is missing. No error or messed up character shows up.
It doesn't help to set any options.

$myxml=utf8_encode($myxml);

and afterwards

can fix the problem surprisingly.

thanks,H.







Test script:
---------------
<?php
$myxml="<XML><FELD1>blubb</FELD1><FELD2>dies hier sonderzeichen €</FELD2><FELD3>feld3</FELD3></XML>";
  
$p = xml_parser_create();
xml_parse_into_struct($p, $myxml, $vals, $index);
xml_parser_free($p);

print_r($vals); 
php?>

Expected result:
----------------
sth like this

Array ( [0] =>
Array ( [tag] => XML [type] => open [level] => 1 [value] => ) [1] => Array ( [tag] => FELD1 [type] => complete [level] => 2 [value] => blubb ) [2] => Array ( [tag] => XML [value] => [type] => cdata [level] => 1 ) [3] => Array ( [tag] => FELD2 [type] => complete [level] => 2 [value] => dies hier sonderzeichen €) [3] => Array ( [tag] => XML [value] => [type] => cdata [level] => 1 ) [3] => Array ( [tag] => FELD3 [type] => complete[level] => 2 [value] => feld3))

Actual result:
--------------
Array ( [0] =>
Array ( [tag] => XML [type] => open [level] => 1 [value] => ) [1] => Array ( [tag] => FELD1 [type] => complete [level] => 2 [value] => blubb ) [2] => Array ( [tag] => XML [value] => [type] => cdata [level] => 1 ) [3] => Array ( [tag] => FELD2 [type] => open [level] => 2 [value] => dies hier sonderzeichen ) ) 

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2012-04-26 20:32 UTC] peter dot e dot lind at gmail dot com
Doesn't seem to be a bug on Linux (PHP 5.3.3-7+squeeze8 with Suhosin-Patch 
(cli)) - when I save the test-script in utf-8 the output is as expected. 
However, if I save it as ISO-8859-15, the xml parsing stops, as one would 
expect, when it hits the illegal character - an ISO-8859-15 € would be illegal 
in utf-8 and xml parsers must stop when they reach illegal content.

The docs could be improved though, as http://dk.php.net/manual/en/function.xml-
parser-create.php specifies that "The supported encodings are ISO-8859-1, UTF-8 
and US-ASCII." but in the context it appears to only be the case for output. 
However, the behaviour suggests those three character sets also restrict input. 
Some comments in the php code also suggest that this is the case.
 [2015-05-14 15:24 UTC] cmb@php.net
-Type: Bug +Type: Documentation Problem -Assigned To: +Assigned To: cmb
 [2015-05-14 15:24 UTC] cmb@php.net
Indeed, Peter, this is not a bug in the XML parser, but rather
invalid characters have been passed in. That could have been
detected by calling xml_error_string().

I agree that the docs need improvement, even though I'm not sure
yet what is actually supported by the XML parser.
 [2015-05-14 17:10 UTC] cmb@php.net
-Status: Assigned +Status: Analyzed -Assigned To: cmb +Assigned To:
 [2015-05-14 17:10 UTC] cmb@php.net
Setting the *input* encoding to 'ISO-8859-1' in
xml_parser_create() is already ignored as of PHP 5.0.0[3], see
<http://3v4l.org/eb2k1>. The input encoding is solely specified by
the encoding attribute of the document's XML declaration (which
defaults to UTF-8 if omitted), see <http://3v4l.org/tUDnV>. The
$encoding parameter of xml_parser_create() specifies only the
output encoding.

The behavior when expat is used as back-end might be different.
 
PHP Copyright © 2001-2019 The PHP Group
All rights reserved.
Last updated: Mon Mar 25 16:01:26 2019 UTC