php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #39565 xmlparser and accentuated letters
Submitted: 2006-11-21 02:16 UTC Modified: 2007-08-17 17:49 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: sotiwin at freemail dot hu Assigned:
Status: Wont fix Package: XML related
PHP Version: 5.2.0 OS: Windows XP
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: sotiwin at freemail dot hu
New email:
PHP Version: OS:

 

 [2006-11-21 02:16 UTC] sotiwin at freemail dot hu
Description:
------------
I want to parse an xml, with accentuated letters, but the parser cuts the characters before the first accentuated letter of each 'characterData'.

I use Wamp5 with php 5.1.6.

Reproduce code:
---------------
<meta http-equiv="Content-type" value="text/html; charset=ISO-8859-1" />
<?php
	function characterData($parser, $data)
	{
		  echo $data.'1';
	}
	$data='<?xml version="1.0" encoding="ISO-8859-1"?><book>Example??</book>';
	echo 'XML input:<br>'.$data;
	
	$xml_parser = xml_parser_create('ISO-8859-1');
	xml_parser_set_option($xml_parser,XML_OPTION_SKIP_WHITE,1);
	xml_set_character_data_handler($xml_parser,'characterData');
	
	echo '<br>Parsed data:<br>';
	if (!xml_parse($xml_parser, $data)) 
	{
		die(sprintf("XML error: %s at line %d",
            xml_error_string(xml_get_error_code($xml_parser)),
            xml_get_current_line_number($xml_parser)));	
						
	}
	xml_parser_free($xml_parser);
?>

Expected result:
----------------
characterData could have run only once.

XML input:
Example??
Parsed data:
Example??1


Actual result:
--------------
characterData could have runs twice.
XML input:
Example??
Parsed data:
Example1??1

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2006-11-21 08:17 UTC] derick@php.net
This is expected behavior, although we should definitely document this in a bit better way - changing it to a documentation problem.
 [2006-11-21 12:27 UTC] sotiwin at freemail dot hu
Really? How could I parse an xml with accented data? Can I get the whole word in one piece?
 [2007-08-17 11:53 UTC] vrana@php.net
This behavior is wrong - all character data should be passed at once.
 [2007-08-17 13:31 UTC] jani@php.net
If it was changed we'd break BC. Documentation issue.
 [2007-08-17 16:19 UTC] vrana@php.net
Fortunately it wouldn't break BC.

If you want to put the whole string to one item you need to concatenate it currently. If the whole string would be returned at once there's nothing to concatenate so the result is the same.
 [2007-08-17 17:49 UTC] derick@php.net
We won't change this, it's simply how libxml2 works and as that is the underlying library that's how it's going to be. Fortunately, it is not *wrong* behavior - just unexpected.
 
PHP Copyright © 2001-2020 The PHP Group
All rights reserved.
Last updated: Fri Jan 24 05:01:23 2020 UTC