php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #42225 XML Parser or array is not return characters back before a accent
Submitted: 2007-08-06 20:35 UTC Modified: 2007-08-07 16:08 UTC
From: tcreamean at starsolutionsllc dot com Assigned:
Status: Not a bug Package: *XML functions
PHP Version: 5.2.4RC1 OS: gentoo
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: tcreamean at starsolutionsllc dot com
New email:
PHP Version: OS:

 

 [2007-08-06 20:35 UTC] tcreamean at starsolutionsllc dot com
Description:
------------
When I parse a xml file with fopen with latian accents it returns the word chopped off in php 5.2.3.

So Austr?lia comes back on the web page like this ?lia in 5.2.3.

When I downgraded to 4.4.7 all is working great Austr?lia comes back as Austr?lia.   


Reproduce code:
---------------
<?xml version="1.0" encoding="ISO-8859-1"?>
<accessNumbers>
 <accessCntry>
   <countryName>Alemanha</countryName>
   <countryID>96</countryID>
   <dialingCode>49</dialingCode>
 </accessCntry>
 <accessCntry>
   <countryName>Argentina</countryName>
   <countryID>14</countryID>
   <dialingCode>54</dialingCode>
 </accessCntry>
 <accessCntry>
   <countryName>Austr?lia</countryName>
   <countryID>20</countryID>
   <dialingCode>61</dialingCode>
 </accessCntry>
</accessNumbers>


<?php
if (!($fp=@fopen("http://xml.cordiaip.com/?ixAcct=accessNumbersCountries&langId=3","r"))) die ("Couldn't open XML.");

$usercount=0;
$userdata=array();
$state='';

if (!($xml_parser = xml_parser_create('iso-8859-1'))) die("Couldn't create parser.");
//xml_parser_set_option($fp, XML_OPTION_CASE_FOLDING, 0);
//xml_parser_set_option($fp, XML_OPTION_SKIP_WHITE, 1);

function startElementHandler ($parser,$element,$attrib){
 global $usercount, $userdata, $state;
       switch ($element) {
         case $element=="ACCESSCNTRY" : {
                $userdata[$usercount]["id"] = $attrib["ID"];
                break;
         }
         default : {
          $state = $element;
          break;
         }
       }
}

function endElementHandler ($parser,$element){
 global $usercount, $userdata, $state;
  $state='';
  if($element=="ACCESSCNTRY") {
   $usercount++;
  }
}

function characterDataHandler ($parser, $data) {
 global $usercount, $userdata, $state;
  if (!$state) {
   return;
  }
  if ($state == "COUNTRYNAME") {
   $userdata[$usercount]["countryname"] = $data;
  }
  if ($state == "COUNTRYID") {
   $userdata[$usercount]["countryID"] = $data;
  }
}

xml_set_element_handler($xml_parser,"startElementHandler","endElementHandler");
xml_set_character_data_handler( $xml_parser, "characterDataHandler");

while( $data = fread($fp, 8192)){
  if(!xml_parse($xml_parser, $data, feof($fp))) {
   break;
  }
}

xml_parser_free($xml_parser);

for ($i=0;$i<$usercount; $i++) {
                                           $x = $i+1;
echo $userdata[$i]["countryID"];
echo $userdata[$i]["countryname"];
}

?>







Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2007-08-07 11:31 UTC] rrichards@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

A quick search of the bugs would have told you character data can be chunked, so might not come all at once to the handler.
 [2007-08-07 16:08 UTC] tcreamean at starsolutionsllc dot com
Hello rrichards,

Sorry I did do a search and got nothing back several times as well as poring over the manuals to find a hint about this. I have never seen this behavior before with a array getting chunked on a accent it was unexpected behavior from a array call. So I call the array twice and concatenated the values to return the true whole value of the element which is now in two chunks. I don't know why this is expected behavior when 4.4.7 returns a whole value of countryname O well I am sure there is a good reason. Thanks for your help and tips for tracking this down keep up the good work.

  if ($state == "COUNTRYNAME") {
   $userdata[$usercount]["countryname"] = $userdata[$usercount]["countryname"].$data;
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 21 14:01:32 2024 UTC