php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #37544 Breaks national encoding
Submitted: 2006-05-21 16:27 UTC Modified: 2006-05-21 16:31 UTC
From: php at juzna dot com Assigned:
Status: Not a bug Package: XML related
PHP Version: 5.1.4 OS: WinXP
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: php at juzna dot com
New email:
PHP Version: OS:

 

 [2006-05-21 16:27 UTC] php at juzna dot com
Description:
------------
XML parsing breaks some national encoding, however there shouldn't be any problem width it.

Reproduce code:
---------------
$xml = file_get_contents('http://xml.vht.cz/demo.xml');

$xml_parser = xml_parser_create();
xml_set_character_data_handler($xml_parser, "characterData");
xml_parse($xml_parser,$xml,true);
xml_parser_free($xml_parser);

function characterData($parser, $data) {
	$data = trim($data);
	if($data) var_dump($data);
}


Expected result:
----------------
Expexted original data with original encoding (like this):

Sutomore - hotel Zlatna obala
Ji?? Bu?ek - CK VHT
Hotelov? komplex Zlatna Obala je um?st?n ve vzd?lenosti cca 500 m od Sutomore. M? vlastn? pl?? v prost?ed? piniov?ho h?je.<br>


Actual result:
--------------
Broken encoding (absolutely wrong characters)

string(29) "Sutomore - hotel Zlatna obala"
string(2) "Ji"
string(20) "ří Buček - CK VHT"
string(7) "Hotelov"


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2006-05-21 16:31 UTC] derick@php.net
Sorry, but your problem does not imply a bug in PHP itself.  For a
list of more appropriate places to ask for help using PHP, please
visit http://www.php.net/support.php as this bug system is not the
appropriate forum for asking support questions.  Due to the volume
of reports we can not explain in detail here why your report is not
a bug.  The support channels will be able to provide an explanation
for you.

Thank you for your interest in PHP.

Not a bug, as the XML parser always outputs UTF8. Use "iconv" to convert it back into the encoding that you want.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Mar 29 08:01:27 2024 UTC