|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #30692 Behaviour change in SAX causes breakage from php4 -> php5
Submitted: 2004-11-05 15:24 UTC Modified: 2004-11-05 15:34 UTC
From: chrivers at iversen-net dot dk Assigned:
Status: Not a bug Package: XML related
PHP Version: 5.0.2 OS: Linux 2.6.5, Debian Sarge
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
Block user comment
Status: Assign to:
Bug Type:
From: chrivers at iversen-net dot dk
New email:
PHP Version: OS:


 [2004-11-05 15:24 UTC] chrivers at iversen-net dot dk
When converting my pages to PHP5 SAX XML parser, they 
broke because of an appearant incompatability. The 
chardata-handler is called in a different pattern that in 
PHP4. Before, it seemed to be called once per character 
block. Now, the buffer is flushed before each block of 
high-bit characters, it seems. This is unexpected and 
(seemingly?) impossible to change.  

Reproduce code:
function es() {}
function ee() {}
function cd($P, $D) {print "[$D]\n";}

#   $str = "UTF:æøå:UTF"; $strenc = "utf-8";
   $str = "ISO:???:ISO"; $strenc = "iso-8859-1";

    $buffer = "<?xml version=\"1.0\" encoding=\"$strenc\"?><global>$str</global>";

    $xml_parser = xml_parser_create();
#    xml_set_element_handler($xml_parser, "es", "ee");
    xml_set_character_data_handler($xml_parser, "cd");
    xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, true);
    xml_parser_set_option($xml_parser, XML_OPTION_TARGET_ENCODING, "iso-8859-1");
    If (xml_parse($xml_parser, $buffer) == false)
      die(sprintf("TV import error: %s at line %d col %d\n%s",


Expected result:
expected: [ISO:???:ISO] 
php4: [ISO:???:ISO] 

Actual result:


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2004-11-05 15:34 UTC]
This is a change, but nothing wrong as a SAX parser just fires events. It might break up character data and this is normal behavior.
PHP Copyright © 2001-2022 The PHP Group
All rights reserved.
Last updated: Sun Dec 04 17:05:52 2022 UTC