php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #32547 DOMDocument->loadHTML() seems to broke (utf-8 russian) codepage
Submitted: 2005-04-02 18:58 UTC Modified: 2010-12-20 11:51 UTC
From: xlex0x835 at rambler dot ru Assigned: rrichards
Status: Not a bug Package: DOM XML related
PHP Version: 5.0.3 OS: Mac OS X 10.3, FreeBSD 5.3
Private report: No CVE-ID:
 [2005-04-02 18:58 UTC] xlex0x835 at rambler dot ru

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2005-04-03 19:44 UTC] xlex0x835 at rambler dot ru
Problem seems to be detected: if I will put <meta> tag 
just after the title, document will be parsed absolutely 
correct.
Is it libxml bug or PHP bindings?
 [2005-04-04 08:40 UTC] chregu@php.net
Not a bug, IMHO. HTML 4 is not XML, therefore it doesn't know about the <?xml ?> processing instruction and doesn't apply that information. You have to use the meta tag in html 4 (loadHTML is only about HTML 4 and not XHTML)

I may be wrong with that assumption, so please point me to the right specs, if loadHTML should recognize that.

But anyway, not a PHP bug, but basically a libxml2 "problem"


 [2005-04-04 09:09 UTC] xlex0x835 at rambler dot ru
As for <xml>, please, read more carefully. I told, that 
I put that tag just to have one correct source for both 
loadHTML() and loadXML() methods.

As for libxml - thank you to confirm that it is that lib 
problem.
 [2005-04-04 09:14 UTC] tony2001@php.net
No PHP bug -> bogus.
 [2010-12-20 11:51 UTC] jani@php.net
-Package: Tidy +Package: DOM XML related
 
PHP Copyright © 2001-2014 The PHP Group
All rights reserved.
Last updated: Fri Apr 18 23:01:58 2014 UTC