php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #66850 Different behaviors of loadXML
Submitted: 2014-03-07 21:12 UTC Modified: 2014-03-11 16:33 UTC
From: goetas at lignano dot it Assigned:
Status: Not a bug Package: DOM XML related
PHP Version: 5.5.10 OS: windows/linux (ubuntu)
Private report: No CVE-ID: None
 [2014-03-07 21:12 UTC] goetas at lignano dot it
Description:
------------
If a XML file is not well formatted, exmample:

<div t:attr="test">foo</div>

(lacks the namespace declaration for "t" namespace)

The DOM extension works differently on windows and also some version of ubuntu all running php 5.5.10 or 5.3.28


DOMDocument::loadXML always return true (with a warning, on all OS)

but DOMDocument::saveXML return different output on different OS


Windows:
DOM/XML API Version:20031129, libxml: 2.9.0


Linux:
DOM/XML API Version20031129, libxml: 2.7.6


Test script:
---------------
$a = new DOMDocument('1.0', 'UTF-8');
var_dump($a->loadXML('<div x:attr="test">foo</div>')); // returns true
echo $a->saveXML();


Expected result:
----------------
<br />
<b>Warning</b>:  DOMDocument::loadXML(): Namespace prefix x for attr on div is not defined in Entity, line: 1 in <b>[...][...]</b> on line <b>2</b><br />
bool(true) <<-- i do not know if it should be true or not
<?xml version="1.0"?>
<div t:attr="test">foo</div>


(xml looks good, all nodes preserved)

Actual result:
--------------
<br />
<b>Warning</b>:  DOMDocument::loadXML(): Namespace prefix x for attr on div is not defined in Entity, line: 1 in <b>[...][...]</b> on line <b>2</b><br />
bool(true)
<?xml version="1.0"?>
<div attr="test">foo</div>

(t: prefix removed)

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2014-03-11 10:06 UTC] ab@php.net
-Status: Open +Status: Not a bug
 [2014-03-11 10:06 UTC] ab@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

There are two points why it's not a bug:

- loading an invalid XML document (or say some HTML4) has always an unpredictable effect, especially if there's no validation happened before

- what you mention as a difference in handling is up to the libxml version, it has nothing to do with the OS. When different dependency libs are used, the some discrepancy is even expected.

Depending on what you do, you can always use the latest libxml by doing a custom build.

Thanks
 [2014-03-11 12:59 UTC] goetas at lignano dot it
But if it is not a bug, why loadXML() returns a 'true'?
 [2014-03-11 14:22 UTC] ab@php.net
You can see the relevant code here http://lxr.php.net/xref/PHP_5_6/ext/dom/document.c#dom_document_parser . Generally we rely on libxml, so if it could parse, that's the precondition for the DOM extension to work further.
 [2014-03-11 14:40 UTC] goetas at lignano dot it
The decision if "DOMDocument::loadXML" worked properly, should be based only on "loadXML" return value.

Should not be linked to warnings or internal libxml issues.

Currently i have to check if "loadXML" returns true and also if "libxml_get_errors" is emply or not...
 [2014-03-11 15:12 UTC] ab@php.net
Have you already played with the properties available in the DOMDocument? That way you could affect the behavior of the libxml in some way, maybe that helps.

But generally I can just repeat that an invalid XML has an unpredictable effect, even if you prefer to call it a libxml issue :) Like say a namespace string itself isn't that important, the important thing is the URI which should have been defined. Say two namespaces a: and b: might be the same in different documents if they were defined with the same URI. But this all is actually far from the case if you had a perfectly valid document.
 [2014-03-11 15:29 UTC] goetas at lignano dot it
I agree with you, that is an libxml issue, 

But sounds strange that:

// NOT valid XML
laodXML('...') // return false

// Valid XML
laodXML('<div/>') // return true

// NOT Valid XML
laodXML('<t:div/>') // return true but should be false

I do non know about libxml internals, but if it raises a warning, can it be used to sets to 'false' the "laodXML" return value?
 [2014-03-11 16:11 UTC] ab@php.net
No, I'm trying to tell exactly the opposite, that's not a libxml issue. A valid XML were at least <root><div/></root>, not even telling about specifying the XML version as well. Reading this also http://www.w3.org/TR/REC-xml-names/#iri-use , if specifying <ns:tag/> when ns is undefined, so empty, but otherwise the document were somehow valid, it were logic to cut off an empty URI at the out. As obviously libxml should not produce invalid XML, but might use a sort of "quirks mode" when reading in.
 [2014-03-11 16:33 UTC] goetas at lignano dot it
Here http://www.w3.org/TR/REC-xml-names/#iri-use talk about
<ns:tag xmlns:tag=""/> (that is "more" valid than just "<ns:tag/>")


- Using <ns:tag xmlns:tag=""/> we says that "" (empty string) is linked to "ns" prefix.
- Using <ns:tag/>, "ns" prefix simply can't be resolved.

This is the reason why:

loadXML('<ns:tag xmlns:tag=""/>') // returns true (without any warning)

while 

loadXML('<ns:tag/>') // returns true ( but WITH some warnings)


I think that it should return false.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Mar 29 14:01:28 2024 UTC