php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #30692 Behaviour change in SAX causes breakage from php4 -> php5
Submitted: 2004-11-05 15:24 UTC Modified: 2004-11-05 15:34 UTC
From: chrivers at iversen-net dot dk Assigned:
Status: Not a bug Package: XML related
PHP Version: 5.0.2 OS: Linux 2.6.5, Debian Sarge
Private report: No CVE-ID: None
 [2004-11-05 15:24 UTC] chrivers at iversen-net dot dk
Description:
------------
When converting my pages to PHP5 SAX XML parser, they 
broke because of an appearant incompatability. The 
chardata-handler is called in a different pattern that in 
PHP4. Before, it seemed to be called once per character 
block. Now, the buffer is flushed before each block of 
high-bit characters, it seems. This is unexpected and 
(seemingly?) impossible to change.  

Reproduce code:
---------------
<?
function es() {}
function ee() {}
function cd($P, $D) {print "[$D]\n";}

#   $str = "UTF:æøå:UTF"; $strenc = "utf-8";
   $str = "ISO:???:ISO"; $strenc = "iso-8859-1";

    $buffer = "<?xml version=\"1.0\" encoding=\"$strenc\"?><global>$str</global>";

    $xml_parser = xml_parser_create();
#    xml_set_element_handler($xml_parser, "es", "ee");
    xml_set_character_data_handler($xml_parser, "cd");
    xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, true);
    xml_parser_set_option($xml_parser, XML_OPTION_TARGET_ENCODING, "iso-8859-1");
    If (xml_parse($xml_parser, $buffer) == false)
      die(sprintf("TV import error: %s at line %d col %d\n%s",
                  xml_error_string(xml_get_error_code($xml_parser)),
                  xml_get_current_line_number($xml_parser),
                  xml_get_current_column_number($xml_parser),
                  $buffer));

    xml_parser_free($xml_parser);
?>


Expected result:
----------------
expected: [ISO:???:ISO] 
php4: [ISO:???:ISO] 
 

Actual result:
--------------
[ISO:] 
[???:ISO] 
 
 

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2004-11-05 15:34 UTC] derick@php.net
This is a change, but nothing wrong as a SAX parser just fires events. It might break up character data and this is normal behavior.
 
PHP Copyright © 2001-2020 The PHP Group
All rights reserved.
Last updated: Tue Nov 24 06:01:23 2020 UTC