php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #42225 XML Parser or array is not return characters back before a accent
Submitted: 2007-08-06 20:35 UTC Modified: 2007-08-07 16:08 UTC
From: tcreamean at starsolutionsllc dot com Assigned:
Status: Not a bug Package: *XML functions
PHP Version: 5.2.4RC1 OS: gentoo
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: tcreamean at starsolutionsllc dot com
New email:
PHP Version: OS:

 

 [2007-08-06 20:35 UTC] tcreamean at starsolutionsllc dot com
Description:
------------
When I parse a xml file with fopen with latian accents it returns the word chopped off in php 5.2.3.

So Austr?lia comes back on the web page like this ?lia in 5.2.3.

When I downgraded to 4.4.7 all is working great Austr?lia comes back as Austr?lia.   


Reproduce code:
---------------
<?xml version="1.0" encoding="ISO-8859-1"?>
<accessNumbers>
 <accessCntry>
   <countryName>Alemanha</countryName>
   <countryID>96</countryID>
   <dialingCode>49</dialingCode>
 </accessCntry>
 <accessCntry>
   <countryName>Argentina</countryName>
   <countryID>14</countryID>
   <dialingCode>54</dialingCode>
 </accessCntry>
 <accessCntry>
   <countryName>Austr?lia</countryName>
   <countryID>20</countryID>
   <dialingCode>61</dialingCode>
 </accessCntry>
</accessNumbers>


<?php
if (!($fp=@fopen("http://xml.cordiaip.com/?ixAcct=accessNumbersCountries&langId=3","r"))) die ("Couldn't open XML.");

$usercount=0;
$userdata=array();
$state='';

if (!($xml_parser = xml_parser_create('iso-8859-1'))) die("Couldn't create parser.");
//xml_parser_set_option($fp, XML_OPTION_CASE_FOLDING, 0);
//xml_parser_set_option($fp, XML_OPTION_SKIP_WHITE, 1);

function startElementHandler ($parser,$element,$attrib){
 global $usercount, $userdata, $state;
       switch ($element) {
         case $element=="ACCESSCNTRY" : {
                $userdata[$usercount]["id"] = $attrib["ID"];
                break;
         }
         default : {
          $state = $element;
          break;
         }
       }
}

function endElementHandler ($parser,$element){
 global $usercount, $userdata, $state;
  $state='';
  if($element=="ACCESSCNTRY") {
   $usercount++;
  }
}

function characterDataHandler ($parser, $data) {
 global $usercount, $userdata, $state;
  if (!$state) {
   return;
  }
  if ($state == "COUNTRYNAME") {
   $userdata[$usercount]["countryname"] = $data;
  }
  if ($state == "COUNTRYID") {
   $userdata[$usercount]["countryID"] = $data;
  }
}

xml_set_element_handler($xml_parser,"startElementHandler","endElementHandler");
xml_set_character_data_handler( $xml_parser, "characterDataHandler");

while( $data = fread($fp, 8192)){
  if(!xml_parse($xml_parser, $data, feof($fp))) {
   break;
  }
}

xml_parser_free($xml_parser);

for ($i=0;$i<$usercount; $i++) {
                                           $x = $i+1;
echo $userdata[$i]["countryID"];
echo $userdata[$i]["countryname"];
}

?>







Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2007-08-07 11:31 UTC] rrichards@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

A quick search of the bugs would have told you character data can be chunked, so might not come all at once to the handler.
 [2007-08-07 16:08 UTC] tcreamean at starsolutionsllc dot com
Hello rrichards,

Sorry I did do a search and got nothing back several times as well as poring over the manuals to find a hint about this. I have never seen this behavior before with a array getting chunked on a accent it was unexpected behavior from a array call. So I call the array twice and concatenated the values to return the true whole value of the element which is now in two chunks. I don't know why this is expected behavior when 4.4.7 returns a whole value of countryname O well I am sure there is a good reason. Thanks for your help and tips for tracking this down keep up the good work.

  if ($state == "COUNTRYNAME") {
   $userdata[$usercount]["countryname"] = $userdata[$usercount]["countryname"].$data;
 
PHP Copyright © 2001-2022 The PHP Group
All rights reserved.
Last updated: Thu Jan 27 06:03:35 2022 UTC