php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #27808 xml_parse() chokes on the UTF-8 BOM
Submitted: 2004-03-31 13:00 UTC Modified: 2004-09-17 15:30 UTC
From: jcalvert at gmx dot net Assigned:
Status: Closed Package: Documentation problem
PHP Version: 5.0.0RC1 OS: Debian Sid
Private report: No CVE-ID: None
 [2004-03-31 13:00 UTC] jcalvert at gmx dot net
Description:
------------
In PHP4 parsing a UTF-8 file with the magic string (\xEF\xBB\xBF) works just fine. In PHP5.0.0RC1 the function returns with an error message saying the string didn't contain any XML data. Stripping the magic string before calling the function yields the expected result.

libxml2* version 2.6.7-1



Patches

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2004-03-31 20:23 UTC] moriyoshi@php.net
Corrected summary.

1. For the sake of backwards compatibility, xml_parser_create() with no arguments generates a parser that only recognises ISO-8859-1.

2. If one passed "UTF-8" to it for the "encoding" argument, the parser backed by libxml assumes any given XML document to be encoded in plain UTF-8 encoding, where no BOM (Byte order mark) is allowed.

3. If one passed "" (a null string) to it, the parser attempts to identify which encoding the document is encoded in by looking at the heading 3 or 4 bytes. In this case a BOM must be there. This might fix your problem.

It seems the third feature is not documented yet, so I'm marking this as a documentation problem.

 [2004-09-17 15:30 UTC] vrana@php.net
This bug has been fixed in the documentation's XML sources. Since the
online and downloadable versions of the documentation need some time
to get updated, we would like to ask you to be a bit patient.

Thank you for the report, and for helping us make our documentation better.

"If empty string is passed, the parser attempts to identify which encoding the document is encoded in by looking at the heading 3 or 4 bytes."
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Aug 16 18:01:27 2024 UTC