php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #34664 DOMDocument::saveXML() result cannot be loaded using loadXML()
Submitted: 2005-09-27 21:38 UTC Modified: 2005-09-29 14:22 UTC
From: bugs dot php dot net at webdevelopers dot cz Assigned:
Status: Not a bug Package: DOM XML related
PHP Version: 5.0.5 OS: Linux version 2.6.6
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: bugs dot php dot net at webdevelopers dot cz
New email:
PHP Version: OS:

 

 [2005-09-27 21:38 UTC] bugs dot php dot net at webdevelopers dot cz
Description:
------------
HTML imported using DOMDocument::loadHTML() is not saved as valid XML using DOMDocument::saveXML()

Reproduce code:
---------------
header('Content-Type: text/plain; charset=UTF-8');

$d=new DOMDocument('1.0', 'UTF-8');
if ($d->loadHTML('<html><body><script><!-- var a=1; a--; --></script></body></html>')) {
  echo "loadHTML: OK\n";
  echo $d->loadXML($d->saveXML());
}

Expected result:
----------------
loadHTML: OK
1


Actual result:
--------------
loadHTML: OK
<br />
<b>Warning</b>:  DOMDocument::loadXML() [<a href='function.loadXML'>function.loadXML</a>]: Comment not terminated 
&lt;!-- var a=1; a in Entity, line: 3 in <b>test.php</b> on line <b>9</b><br />

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2005-09-27 21:43 UTC] bugs dot php dot net at webdevelopers dot cz
Sorry for the incorrect bug's name:
"invalid feof() results due to Zend's caching of user stream wrappers"

My browser's autocompletion prefilled it and I didn't noticed it... ;-)
 [2005-09-28 05:28 UTC] rasmus@php.net
What exactly is the bug here?  You can't have -- in an HTML/XML comment and the warning message is telling you exactly that.  Remove the a-- and it works just fine.
 [2005-09-28 23:02 UTC] rrichards@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

HTML is not XML (hence the different I/O methods) so you cant expect things to always be fixed when trying to do a conversion like that.
 [2005-09-29 13:14 UTC] bugs dot php dot net at webdevelopers dot cz
The bug is that I have a DOM document that was imported from HTML (actually I'm having the object that handles the DOM and I may not be aware of the way how it was created...).

When I have valid DOM document then I expect (or I thought that I can expect) that I'll produce valid XML using the method $d->saveXML() and the result can be loaded again without problem with loadXML()... This is not the true.

Following function may fail under certain circumstances when the DOMDocument was created using loadHTML():

function test(DOMDocument $d) {
  echo DOMDocument::loadXML($d->saveXML());
}

I'm not usre if it is not confusing if the programmer cannot be sure that the saveXML() is not compatible with loadXML()...

---

Yesterday when I was playing with loadHTML() and then using the saveXML() it even resulted in XML like this
<html xmlns="someNS" xmlns="someNS">...</html>
- it has duplicite attribute @xmlns which is not a XML so saveXML() is more saveSometimesXML(). Obviosly when importing the HTML then the @xmlns is treated as common attribute and on saving there is added namespace declaration alongside @xmlns with existing @xmlns attribute.

It looks like saveXML() is not producing XML despite the name of the function. I think it is a bug. Don't you think? Or is it meant functionality and just the function's name is confusing me?
 [2005-09-29 14:22 UTC] rrichards@php.net
The name is just confusing you. saveXML doesn't magically produce well-formed XML from non XML data (i.e. HTML). It just enforces the output is well-formed otherwise issue an error. If you dont know the type of document then check the DOMDocument type:
XML_DOCUMENT_NODE - XML data
XML_HTML_DOCUMENT_NODE - HTML data
and use the correct Save method. The only way to saveXML() can be safely used with HTML is if the HTML is really XHTML (which could be loaded with loadXML() otherwise results will vary.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu May 16 13:01:32 2024 UTC