php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #30692 Behaviour change in SAX causes breakage from php4 -> php5
Submitted: 2004-11-05 15:24 UTC Modified: 2004-11-05 15:34 UTC
From: chrivers at iversen-net dot dk Assigned:
Status: Not a bug Package: XML related
PHP Version: 5.0.2 OS: Linux 2.6.5, Debian Sarge
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: chrivers at iversen-net dot dk
New email:
PHP Version: OS:

 

 [2004-11-05 15:24 UTC] chrivers at iversen-net dot dk
Description:
------------
When converting my pages to PHP5 SAX XML parser, they 
broke because of an appearant incompatability. The 
chardata-handler is called in a different pattern that in 
PHP4. Before, it seemed to be called once per character 
block. Now, the buffer is flushed before each block of 
high-bit characters, it seems. This is unexpected and 
(seemingly?) impossible to change.  

Reproduce code:
---------------
<?
function es() {}
function ee() {}
function cd($P, $D) {print "[$D]\n";}

#   $str = "UTF:æøå:UTF"; $strenc = "utf-8";
   $str = "ISO:???:ISO"; $strenc = "iso-8859-1";

    $buffer = "<?xml version=\"1.0\" encoding=\"$strenc\"?><global>$str</global>";

    $xml_parser = xml_parser_create();
#    xml_set_element_handler($xml_parser, "es", "ee");
    xml_set_character_data_handler($xml_parser, "cd");
    xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, true);
    xml_parser_set_option($xml_parser, XML_OPTION_TARGET_ENCODING, "iso-8859-1");
    If (xml_parse($xml_parser, $buffer) == false)
      die(sprintf("TV import error: %s at line %d col %d\n%s",
                  xml_error_string(xml_get_error_code($xml_parser)),
                  xml_get_current_line_number($xml_parser),
                  xml_get_current_column_number($xml_parser),
                  $buffer));

    xml_parser_free($xml_parser);
?>


Expected result:
----------------
expected: [ISO:???:ISO] 
php4: [ISO:???:ISO] 
 

Actual result:
--------------
[ISO:] 
[???:ISO] 
 
 

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2004-11-05 15:34 UTC] derick@php.net
This is a change, but nothing wrong as a SAX parser just fires events. It might break up character data and this is normal behavior.
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Wed Jul 02 07:01:33 2025 UTC