php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #39565 xmlparser and accentuated letters
Submitted: 2006-11-21 02:16 UTC Modified: 2007-08-17 17:49 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: sotiwin at freemail dot hu Assigned:
Status: Wont fix Package: XML related
PHP Version: 5.2.0 OS: Windows XP
Private report: No CVE-ID: None
Have you experienced this issue?
Rate the importance of this bug to you:

 [2006-11-21 02:16 UTC] sotiwin at freemail dot hu
Description:
------------
I want to parse an xml, with accentuated letters, but the parser cuts the characters before the first accentuated letter of each 'characterData'.

I use Wamp5 with php 5.1.6.

Reproduce code:
---------------
<meta http-equiv="Content-type" value="text/html; charset=ISO-8859-1" />
<?php
	function characterData($parser, $data)
	{
		  echo $data.'1';
	}
	$data='<?xml version="1.0" encoding="ISO-8859-1"?><book>Example??</book>';
	echo 'XML input:<br>'.$data;
	
	$xml_parser = xml_parser_create('ISO-8859-1');
	xml_parser_set_option($xml_parser,XML_OPTION_SKIP_WHITE,1);
	xml_set_character_data_handler($xml_parser,'characterData');
	
	echo '<br>Parsed data:<br>';
	if (!xml_parse($xml_parser, $data)) 
	{
		die(sprintf("XML error: %s at line %d",
            xml_error_string(xml_get_error_code($xml_parser)),
            xml_get_current_line_number($xml_parser)));	
						
	}
	xml_parser_free($xml_parser);
?>

Expected result:
----------------
characterData could have run only once.

XML input:
Example??
Parsed data:
Example??1


Actual result:
--------------
characterData could have runs twice.
XML input:
Example??
Parsed data:
Example1??1

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2006-11-21 08:17 UTC] derick@php.net
This is expected behavior, although we should definitely document this in a bit better way - changing it to a documentation problem.
 [2006-11-21 12:27 UTC] sotiwin at freemail dot hu
Really? How could I parse an xml with accented data? Can I get the whole word in one piece?
 [2007-08-17 11:53 UTC] vrana@php.net
This behavior is wrong - all character data should be passed at once.
 [2007-08-17 13:31 UTC] jani@php.net
If it was changed we'd break BC. Documentation issue.
 [2007-08-17 16:19 UTC] vrana@php.net
Fortunately it wouldn't break BC.

If you want to put the whole string to one item you need to concatenate it currently. If the whole string would be returned at once there's nothing to concatenate so the result is the same.
 [2007-08-17 17:49 UTC] derick@php.net
We won't change this, it's simply how libxml2 works and as that is the underlying library that's how it's going to be. Fortunately, it is not *wrong* behavior - just unexpected.
 
PHP Copyright © 2001-2020 The PHP Group
All rights reserved.
Last updated: Fri Jan 24 11:01:24 2020 UTC