|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #42225 XML Parser or array is not return characters back before a accent
Submitted: 2007-08-06 20:35 UTC Modified: 2007-08-07 16:08 UTC
From: tcreamean at starsolutionsllc dot com Assigned:
Status: Not a bug Package: *XML functions
PHP Version: 5.2.4RC1 OS: gentoo
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
Block user comment
Status: Assign to:
Bug Type:
From: tcreamean at starsolutionsllc dot com
New email:
PHP Version: OS:


 [2007-08-06 20:35 UTC] tcreamean at starsolutionsllc dot com
When I parse a xml file with fopen with latian accents it returns the word chopped off in php 5.2.3.

So Austr?lia comes back on the web page like this ?lia in 5.2.3.

When I downgraded to 4.4.7 all is working great Austr?lia comes back as Austr?lia.   

Reproduce code:
<?xml version="1.0" encoding="ISO-8859-1"?>

if (!($fp=@fopen("","r"))) die ("Couldn't open XML.");


if (!($xml_parser = xml_parser_create('iso-8859-1'))) die("Couldn't create parser.");
//xml_parser_set_option($fp, XML_OPTION_CASE_FOLDING, 0);
//xml_parser_set_option($fp, XML_OPTION_SKIP_WHITE, 1);

function startElementHandler ($parser,$element,$attrib){
 global $usercount, $userdata, $state;
       switch ($element) {
         case $element=="ACCESSCNTRY" : {
                $userdata[$usercount]["id"] = $attrib["ID"];
         default : {
          $state = $element;

function endElementHandler ($parser,$element){
 global $usercount, $userdata, $state;
  if($element=="ACCESSCNTRY") {

function characterDataHandler ($parser, $data) {
 global $usercount, $userdata, $state;
  if (!$state) {
  if ($state == "COUNTRYNAME") {
   $userdata[$usercount]["countryname"] = $data;
  if ($state == "COUNTRYID") {
   $userdata[$usercount]["countryID"] = $data;

xml_set_character_data_handler( $xml_parser, "characterDataHandler");

while( $data = fread($fp, 8192)){
  if(!xml_parse($xml_parser, $data, feof($fp))) {


for ($i=0;$i<$usercount; $i++) {
                                           $x = $i+1;
echo $userdata[$i]["countryID"];
echo $userdata[$i]["countryname"];



Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2007-08-07 11:31 UTC]
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at and the instructions on how to report
a bug at

A quick search of the bugs would have told you character data can be chunked, so might not come all at once to the handler.
 [2007-08-07 16:08 UTC] tcreamean at starsolutionsllc dot com
Hello rrichards,

Sorry I did do a search and got nothing back several times as well as poring over the manuals to find a hint about this. I have never seen this behavior before with a array getting chunked on a accent it was unexpected behavior from a array call. So I call the array twice and concatenated the values to return the true whole value of the element which is now in two chunks. I don't know why this is expected behavior when 4.4.7 returns a whole value of countryname O well I am sure there is a good reason. Thanks for your help and tips for tracking this down keep up the good work.

  if ($state == "COUNTRYNAME") {
   $userdata[$usercount]["countryname"] = $userdata[$usercount]["countryname"].$data;
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Feb 24 14:01:29 2024 UTC