php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #18387 Incorrect work with xml encoding
Submitted: 2002-07-17 03:55 UTC Modified: 2010-09-01 04:35 UTC
From: svazulia at tvc dot ru Assigned: k.schroeder (profile)
Status: Wont fix Package: Documentation problem
PHP Version: 4.2.1 OS: Windows (98,2000)
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: svazulia at tvc dot ru
New email:
PHP Version: OS:

 

 [2002-07-17 03:55 UTC] svazulia at tvc dot ru
short script:
<?php
if(!$dom = domxml_open_file("b00000000001.xml")) {
  echo "Error while parsing the document\n";
  exit;
}
$root = $dom->document_element();
print_r($root);
?>
return: "Error while parsing the document"
when xml document has encoding option, like this:
<?xml version="1.0" encoding='WINDOWS-1251' ?>
without "encoding='WINDOWS-1251'" it work ok.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2002-07-17 13:48 UTC] flying at dom dot natm dot ru
I think it is expected behaviour. According to XML specification the only encodings, XML parsers must support are UTF-8 and UTF-16. libxml2 by default supports them and iso-8859-1. 
 So you should workaround your problem by converting your XML documents to UTF-8 (you can also do it on the fly using iconv() function). 
 
2 PHP Developers: I think this bug should be moved to "Documentation problem" category, because documentation miss this important note about list of supported encodings and how to handle documents with other encodings.
 [2002-07-17 14:01 UTC] sniper@php.net
reclassified

 [2002-07-17 16:09 UTC] chregu@php.net
Just for the record:

From http://xmlsoft.org/encoding.html: 

Default supported encodings [by libxml2]

libxml has a set of default converters for the following encodings (located in encoding.c):

   1. UTF-8 is supported by default (null handlers)
   2. UTF-16, both little and big endian
   3. ISO-Latin-1 (ISO-8859-1) covering most western languages
   4. ASCII, useful mostly for saving
   5. HTML, a specific handler for the conversion of UTF-8 to ASCII with HTML predefined entities like &copy; for the Copyright sign.

chregu
 [2002-07-17 16:11 UTC] chregu@php.net
ooops. the text goes even further on that page:

"More over when compiled on an Unix platform with iconv support the full set of encodings supported by iconv can be instantly be used by libxml. On a linux machine with glibc-2.1 the list of supported encodings and aliases fill 3 full pages, and include UCS-4, the full set of ISO-Latin encodings, and the various Japanese ones."

But that won't help the original poster since he's using windows..

chregu
 [2002-11-27 04:07 UTC] k.schroeder@php.net
Not enough information was provided for us to be able
to handle this bug. Please re-read the instructions at
http://bugs.php.net/how-to-report.php

If you can provide more information, feel free to add it
to this bug and change the status back to "Open".

Thank you for your interest in PHP.


Please try install sablotron complete package (including iconv).

Regards, Kai
 [2010-09-01 04:35 UTC] k.schroeder@php.net
-Status: No Feedback +Status: Wont fix
 [2010-09-01 04:35 UTC] k.schroeder@php.net
We are sorry, but we can not support PHP 4 related problems anymore.
Momentum is gathering for PHP 6, and we think supporting PHP 4 will
lead to a waste of resources which we want to put into getting PHP 6
ready.

From the manual:

"This extension has been moved to the ยป PECL repository and is no longer bundled with PHP as of PHP 5.0.0."
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Dec 27 07:01:28 2024 UTC