|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #39565 xmlparser and accentuated letters
Submitted: 2006-11-21 02:16 UTC Modified: 2007-08-17 17:49 UTC
Avg. Score:5.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: sotiwin at freemail dot hu Assigned:
Status: Wont fix Package: XML related
PHP Version: 5.2.0 OS: Windows XP
Private report: No CVE-ID: None
Have you experienced this issue?
Rate the importance of this bug to you:

 [2006-11-21 02:16 UTC] sotiwin at freemail dot hu
I want to parse an xml, with accentuated letters, but the parser cuts the characters before the first accentuated letter of each 'characterData'.

I use Wamp5 with php 5.1.6.

Reproduce code:
<meta http-equiv="Content-type" value="text/html; charset=ISO-8859-1" />
	function characterData($parser, $data)
		  echo $data.'1';
	$data='<?xml version="1.0" encoding="ISO-8859-1"?><book>Example??</book>';
	echo 'XML input:<br>'.$data;
	$xml_parser = xml_parser_create('ISO-8859-1');
	echo '<br>Parsed data:<br>';
	if (!xml_parse($xml_parser, $data)) 
		die(sprintf("XML error: %s at line %d",

Expected result:
characterData could have run only once.

XML input:
Parsed data:

Actual result:
characterData could have runs twice.
XML input:
Parsed data:


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2006-11-21 08:17 UTC]
This is expected behavior, although we should definitely document this in a bit better way - changing it to a documentation problem.
 [2006-11-21 12:27 UTC] sotiwin at freemail dot hu
Really? How could I parse an xml with accented data? Can I get the whole word in one piece?
 [2007-08-17 11:53 UTC]
This behavior is wrong - all character data should be passed at once.
 [2007-08-17 13:31 UTC]
If it was changed we'd break BC. Documentation issue.
 [2007-08-17 16:19 UTC]
Fortunately it wouldn't break BC.

If you want to put the whole string to one item you need to concatenate it currently. If the whole string would be returned at once there's nothing to concatenate so the result is the same.
 [2007-08-17 17:49 UTC]
We won't change this, it's simply how libxml2 works and as that is the underlying library that's how it's going to be. Fortunately, it is not *wrong* behavior - just unexpected.
PHP Copyright © 2001-2020 The PHP Group
All rights reserved.
Last updated: Fri Aug 14 21:01:24 2020 UTC