php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #42026 Incorrect handling of default namespaces
Submitted: 2007-07-18 11:09 UTC Modified: 2007-07-18 16:25 UTC
From: mail_ben_schmidt at yahoo dot com dot au Assigned:
Status: Not a bug Package: SimpleXML related
PHP Version: 5.2.3 / 5CVS OS: Mac OS X
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: mail_ben_schmidt at yahoo dot com dot au
New email:
PHP Version: OS:

 

 [2007-07-18 11:09 UTC] mail_ben_schmidt at yahoo dot com dot au
Description:
------------
When getting children/attributes without using methods to specify a namespace explicity, non-prefixed names are returned, but not prefixed ones, even if the prefixed names are in the same namespace.

(Note that in fixing this bug there is a subtlety: the default namespace for attributes is inherited from the element in which they appear.)



Reproduce code:
---------------
$str=<<<DONE
<?xml version="1.0" encoding="utf-8"?>
<root
	xmlns="http://localhost/a"
	xmlns:ns="http://localhost/a"
	xmlns:other="http://localhost/b"
>
<elem attr="abc"/>1
<elem ns:attr="abc"/>1
<elem other:attr="abc"/>0
<ns:elem attr="abc"/>1
<ns:elem ns:attr="abc"/>1
<ns:elem other:attr="abc"/>0
<other:elem attr="abc"/>0
<other:elem ns:attr="abc"/>0
<other:elem other:attr="abc"/>1
</root>
DONE;

$xml = simplexml_load_string($str);
foreach ($xml->elem as $elem) {
	echo count($elem->attributes());
}
echo "\n";
foreach ($xml->children('http://localhost/a')->elem as $elem) {
	echo count($elem->attributes());
}
echo "\n";
foreach ($xml->children('http://localhost/b')->elem as $elem) {
	echo count($elem->attributes());
}
echo "\n";


Expected result:
----------------
110110 110110 101

Actual result:
--------------
100 100100 100

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2007-07-18 11:12 UTC] mail_ben_schmidt at yahoo dot com dot au
Irrelevant to the bug itself, but one of my numbers that were meant to be helpful in the code example was wrong. Should have read:

<other:elem attr="abc"/>1
<other:elem ns:attr="abc"/>0
<other:elem other:attr="abc"/>1
 [2007-07-18 11:59 UTC] rrichards@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

Works as it should. Not specifying a namespace returns un-prefixed elements (no or default namespace), as no namespace testing is performed other than checking for no prefix. There is no default namespace for attributes.
 [2007-07-18 13:48 UTC] mail_ben_schmidt at yahoo dot com dot au
Telling me to check the manual is somewhat useless, as this module is poorly documented, particularly in this area. The best 'documentation' available from what I can gather is at

http://devzone.zend.com/node/view/id/688#Heading3

where it states, "If no namespace is passed to the children() function, all the elements in the global namespace are returned."

This is clearly not the behaviour exhibited, because in the case in question the 'global namespace' *is* the namespace 'ns' (in the first six <elem>s anyway). Regardless of whether an element is in that namespace by virtue of a prefix or by default, it is still in the global namespace.

Purely from a logical perspective, though, the behaviour is clearly wrong. To the XML author

<root xmlns="a" xmlns:ns="b">
<elem attr="something">
</root>

is equivalent to

<root xmlns="a" xmlns:ns="b">
<ns:elem attr="something">
</root>

which is equivalent to

<root xmlns="a" xmlns:ns="b">
<ns:elem ns:attr="something">
</root>

and so on, and any author should have the right to expect both those snippets (and other such modifications) to be treated identically by any parser/XML binding system, as they mean exactly the same thing. The difference is merely one of stylistic choice or ease of implementation of some XML generator, and it shouldn't have the side-effect/ability of hiding things from some XML application/script.

I would be happy to have a go fixing the bug unless the consensus of the developers truly is that this shouldn't be fixed. Or even perhaps if it is, as it would simply my application. I can think of a couple of workarounds, but they aren't very pleasant.
 [2007-07-18 13:52 UTC] mail_ben_schmidt at yahoo dot com dot au
O, and yes, I did try the CVS version, and it behaves the same way. I didn't do 'make install' but just ran the binary in place, but I presume this wouldn't have an impact. If that isn't a valid test, let me know, and I will reconfigure with some temporary prefix and 'make install'. Forgot to mention in previous post.
 [2007-07-18 14:34 UTC] rrichards@php.net
The behavior is correct and that statement was not exactly correct concerning the behavior. The intended behavior is to operate on non-prefixed elements without checking namespace or operating on elements within a specific namespace when instructed.

There is no bug here and those referenced documents are not exactly the same (an attribute without a prefix does not fall into default namespace like an element does).


 [2007-07-18 15:59 UTC] mail_ben_schmidt at yahoo dot com dot au
Hmmm. OK. I've had a little look at the sources and a little think about the problem as well as a related one which I sent a post to the php.xml.dev newsgroup about, asking a question. Here are some thoughts:

1. Although I believe the current behaviour is definitely wrong, it seems like others may not agree, though I would hope the majority would, because IMHO it has pretty serious implications and undermines the whole notion of the semantics of XML documents and their namespaces.

2. I can see that there are possibly benefits to not checking namespaces, e.g. for efficiency. And also that it may well 'just work' most of the time. It may be wise to at least document the current behaviour, to avoid confusion of others.

3. A another look at the XML namespace spec and I see what you mean about attribute names. I was confused by this sentence in the spec, which I took to mean that an attribute defaulted to the namespace of the element containing it: "Default namespace declarations do not apply directly to attribute names; the interpretation of unprefixed attributes is determined by the element on which they appear." But it is contradicted in the paragraph following it, so it must mean something else, but I can't for the life of me figure out what!

4. Another, more subtle issue is that an XML application is likely to want to deal with one particular namespace regularly throughout its work with an XML document, regardless of whether the XML author chose to make it the default namespace or not.

5. I was also interested in the behaviour described in

http://www.onlamp.com/pub/a/php/2004/01/15/simplexml.html?page=2

of a beta version of SimpleXML. It seems to me that this would be quite useful, despite the risks. And if you know the details of the documents you are dealing with, you can evaluate the risks anyway.

6. The upshot of all this is that I think a solution to all these problems could be defining two new methods:

setElementNamespaceMode($ns,$is_prefix)
setAttributeNamespaceMode($ns,$is_prefix)

This would set the namespace for all lookups, iterations, etc. 'from that point onwards'. The behaviour I want could be signalled by passing some constant that evaluates to an internal, illegal namespace name. The current behaviour could be signalled by set...NamespaceMode('',true) to check for an empty prefix (or another constant). Another magic constant could enable an 'ignore namespaces and return everything' behaviour like the beta. And of course a specific namespace could be set by passing it in directly. And you could switch namespaces by calling the methods multiple times. The namespace mode should be stored with the instance the method is called upon and propagated to instances derived from it (I mean returned by it, one way or another, whether by property lookup, array index, iteration, whatever). This means things like this would work:

$bunch_of_elems->set...NamespaceMode('a');
foreach ($bunch_of_elems as $elem) {
	...
	do_something_including_changing_namespace_to_b($elem);
	...
	// a is still the namespace mode of $elem, presuming $elem
	// wasn't passed by reference to the function above,
	// and also the namespace mode of $bunch_of_elems for the
	// next iteration of the foreach.
}

7. I should also say that in general, I think the SimpleXML extension is wonderful. Definitely has lots of potential. But I think it could be significantly more useful with this addition and have a lot more applications.

8. I would be willing to have a go implementing this, but I would probably benefit from a little guidance, as I have never done anything with Zend or Libxml before. And I haven't done all that much with XML and its namespaces before, either! But pretty much all the stuff I need is already in the SimpleXML source; it just needs a little rearranging and some control flow statements around. It looks like just editing it shouldn't be too steep a learning curve.

9. A bug report probably isn't the best place to have a discussion like this. Should we move it elsewhere? If you are willing to discuss further, that is. I'm quite excited by the prospect of making SimpleXML a bit better, and thus my PHP coding quite a bit...well...simpler, and probably helping others out a bit, too!
 [2007-07-18 16:01 UTC] mail_ben_schmidt at yahoo dot com dot au
(And apologies for opening the bug again, as you have clearly indicated you consider it bogus; wasn't sure whether you'd see my comment if I didn't, though, and keen to discuss it further.)
 [2007-07-18 16:25 UTC] jani@php.net
Please don't use the bug system as discussion board. Use email for that. :)
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Sep 20 18:01:27 2024 UTC