php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #34591 XML parsing seems to skip some data
Submitted: 2005-09-21 22:59 UTC Modified: 2005-09-22 13:58 UTC
Votes:1
Avg. Score:3.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:0 (0.0%)
Same OS:0 (0.0%)
From: priit at ww dot ee Assigned:
Status: Not a bug Package: XML related
PHP Version: 5.0.5 OS: Windows XP SP2
Private report: No CVE-ID: None
 [2005-09-21 22:59 UTC] priit at ww dot ee
Description:
------------
XML parsing seems to skip certain info. specially data in fields before ???? (non-american-letters) etc.

The same code has worked perfectly on any PHP 4.1+ (well until 4.3.6 or 7 I have tested it) versions.

Reproduce code:
---------------
<?xml version="1.0" encoding="ISO-8859-15"?>
  <rida>
   <tootaja_id>519</tootaja_id>
   <eesnimi>XxX</eesnimi>
   <perekonnanimi>YyY</perekonnanimi>
   <synniaeg>21.02.1900</synniaeg>
   <aadress>P?llu 1202-2, 10920 Tallinn</aadress>
   <haridustase>k?rgharidus</haridustase>
   <eriala>kaubandus?konoomika</eriala>
   <telefon>625 7700</telefon>
   <e_post>XxX.YyY@eee.ee</e_post>
   <ametijuhend_viit></ametijuhend_viit>
   <asutus>T??turuamet</asutus>
   <yksus_nimetus>Juhtkond</yksus_nimetus>
   <yksus_id>10</yksus_id>
   <prioriteet>1</prioriteet>
   <on_peatumine>0</on_peatumine>
  </rida>

Expected result:
----------------
aadress => P?llu 1202-2, 10920 Tallinn
haridustase => k?rgharidus
eriala => kaubandus?konoomika
asutus => T??turuamet

(skipped the lines that were parsed fine!)

PHP code I use is from php homepage sample or something...

Actual result:
--------------
aadress => ?llu 1202-2, 10920 Tallinn
haridustase => ?rgharidus
eriala => ?konoomika
asutus => ??turuamet



Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2005-09-21 23:04 UTC] tony2001@php.net
Thank you for this bug report. To properly diagnose the problem, we
need a short but complete example script to be able to reproduce
this bug ourselves. 

A proper reproducing script starts with <?php and ends with ?>,
is max. 10-20 lines long and does not require any external 
resources such as databases, etc.

If possible, make the script source available online and provide
an URL to it here. Try to avoid embedding huge scripts into the report.


 [2005-09-21 23:15 UTC] priit at ww dot ee
using php code:
$xml_parser = xml_parser_create('ISO-8859-1');
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");

$data = fread($fp, filesize($file));
if(!xml_parse($xml_parser, $data, feof($fp)))
    {
    die(sprintf("XML error: %s at line %d",
    xml_error_string(xml_get_error_code($xml_parser)),
    xml_get_current_line_number($xml_parser)));
    }

xml_parser_free($xml_parser);
 [2005-09-21 23:17 UTC] sniper@php.net
Try using compatible charset.

 [2005-09-21 23:35 UTC] priit at ww dot ee
<?php
$file = 'failid/kontakt.xml';

function startElement($parser, $name, $attrs)
    {
    global $temp_name;
    $temp_name = $name;
    }

function endElement($parser, $name)
    {
    global $temp_name,$temp_value,$moodul;
    if($temp_name==$name) {$moodul .= "($temp_name : $temp_value)\n";}
    }

function characterData($parser, $data)
    {
    global $temp_value;
    $temp_value = $data;
    }

$moodul = '<PRE>';
$xml_parser = xml_parser_create('ISO-8859-1');
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");
if (!($fp = fopen($file, "r"))) { die("could not open XML input"); }
$data = fread($fp, filesize($file));
if(!xml_parse($xml_parser, $data, feof($fp)))
    {
    die(sprintf("XML error: %s at line %d",
    xml_error_string(xml_get_error_code($xml_parser)),
    xml_get_current_line_number($xml_parser)));
    }

xml_parser_free($xml_parser);
$moodul .= '</PRE>';

echo $moodul;

?>
 [2005-09-21 23:38 UTC] priit at ww dot ee
charset has nothing to do with it. I tried converting the input to latin1 first and utf8 and changed the xml_parser_create charset accordingly, but the result was the same always! the characters in my example xml are compatible with ISO-8859-1 although the xml input it sayd to be ISO-8859-15 ... if I changed the xml itself also to something else the result was still the same.
 [2005-09-22 13:37 UTC] priit at ww dot ee
...
 [2005-09-22 13:48 UTC] tony2001@php.net
Well, then try to provide really complete reproduce script?
I doubt that "$file = 'failid/kontakt.xml'; .." will work without the .xml file.
 [2005-09-22 13:58 UTC] marcot@php.net
Sorry, but your problem does not imply a bug in PHP itself.  For a
list of more appropriate places to ask for help using PHP, please
visit http://www.php.net/support.php as this bug system is not the
appropriate forum for asking support questions.  Due to the volume
of reports we can not explain in detail here why your report is not
a bug.  The support channels will be able to provide an explanation
for you.

Thank you for your interest in PHP.

It's a bad idea to assume that all the data associated with a particular element is returned to you in a single function call to your characterData() handler, which is what you're doing here. What is happening is that characterData() is being called more than once for the same element, and you're overwriting the global $temp_value variable every time.

Making these changes to your code makes it work just fine for me on 5.1:

function endElement($parser, $name)
    {
    global $temp_name,$temp_value,$moodul;
    if($temp_name==$name) {$moodul .= "($temp_name : $temp_value)\n";}
    $temp_value = '';
    }

function characterData($parser, $data)
    {
    global $temp_value;
    $temp_value .= $data;
    }


 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Apr 25 21:01:36 2024 UTC