|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #39565 xmlparser and accentuated letters
Submitted: 2006-11-21 02:16 UTC Modified: 2007-08-17 17:49 UTC
Avg. Score:5.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: sotiwin at freemail dot hu Assigned:
Status: Wont fix Package: XML related
PHP Version: 5.2.0 OS: Windows XP
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
Block user comment
Status: Assign to:
Bug Type:
From: sotiwin at freemail dot hu
New email:
PHP Version: OS:


 [2006-11-21 02:16 UTC] sotiwin at freemail dot hu
I want to parse an xml, with accentuated letters, but the parser cuts the characters before the first accentuated letter of each 'characterData'.

I use Wamp5 with php 5.1.6.

Reproduce code:
<meta http-equiv="Content-type" value="text/html; charset=ISO-8859-1" />
	function characterData($parser, $data)
		  echo $data.'1';
	$data='<?xml version="1.0" encoding="ISO-8859-1"?><book>Example??</book>';
	echo 'XML input:<br>'.$data;
	$xml_parser = xml_parser_create('ISO-8859-1');
	echo '<br>Parsed data:<br>';
	if (!xml_parse($xml_parser, $data)) 
		die(sprintf("XML error: %s at line %d",

Expected result:
characterData could have run only once.

XML input:
Parsed data:

Actual result:
characterData could have runs twice.
XML input:
Parsed data:


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2006-11-21 08:17 UTC]
This is expected behavior, although we should definitely document this in a bit better way - changing it to a documentation problem.
 [2006-11-21 12:27 UTC] sotiwin at freemail dot hu
Really? How could I parse an xml with accented data? Can I get the whole word in one piece?
 [2007-08-17 11:53 UTC]
This behavior is wrong - all character data should be passed at once.
 [2007-08-17 13:31 UTC]
If it was changed we'd break BC. Documentation issue.
 [2007-08-17 16:19 UTC]
Fortunately it wouldn't break BC.

If you want to put the whole string to one item you need to concatenate it currently. If the whole string would be returned at once there's nothing to concatenate so the result is the same.
 [2007-08-17 17:49 UTC]
We won't change this, it's simply how libxml2 works and as that is the underlying library that's how it's going to be. Fortunately, it is not *wrong* behavior - just unexpected.
PHP Copyright © 2001-2020 The PHP Group
All rights reserved.
Last updated: Fri Aug 07 13:01:24 2020 UTC