php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #44648 Attribute names not checked for well-formedness
Submitted: 2008-04-05 21:55 UTC Modified: 2008-04-17 20:24 UTC
Votes:1
Avg. Score:4.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:0 (0.0%)
From: ezyang@php.net Assigned: rrichards (profile)
Status: Closed Package: DOM XML related
PHP Version: 5.2.5 OS: Windows Vista
Private report: No CVE-ID: None
 [2008-04-05 21:55 UTC] ezyang@php.net
Description:
------------
libxml2 is fairly lenient when it comes to what it allows to go into its nodes; you can set attributes and tags with illegal characters in them and it won't complain. The burden is on the userland code to perform an appropriate check with the xmlValidate*() functions.

PHP's DOM implementation is extremely spotty when it comes to these checks, which allows for some broken XML to easily be generated. For example,

$d = new DOMDocument();
$d->appendChild($n = $d->createElement('a'));
$n->setAttribute('"@', 'foo');
echo $d->saveXML();

outputs:

<?xml version="1.0"?>
<a "@="foo"/>

Which is clearly incorrect. However, if I attempt to

$d->createElement('a@');

DOM complains, because xmlValidateName was called on the element name.

Now, I actually don't mind the lack of checking; the DOM tree is useful for things like HTML, where the rules are slightly different from XMLs; an HTML tree can contain a "a@" node, although it would not be valid HTML. (You can try it out for yourself on Firefox by putting that in a document and then inspecting the DOM).

However, I want consistency, and I also want the ability to switch on strict checking when I so desire (especially when I'm producing XML). So I want all-or-nothing production checks in PHP DOM, adding another property in DOMDocument (or maybe even a global libxml configuration option) that specifies whether or not strict production checking should be done.


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2008-04-05 22:54 UTC] rrichards@php.net
assign to self.

The strictness is dependent upon the DOM specs and setAttribute should 
be throwing an exception in that case. While I am going to go through 
and check other methods, let me know if you come across any others that 
are not validating names correctly.


 [2008-04-05 23:02 UTC] ezyang@php.net
IIRC, DOM does not make any demands on names or things like that. libxml2, which is known for its strictness, doesn't either. So, I'm still hoping that you'll let the checks be turned off. :-)

Some things from my investigation:

- Double hyphens (--) are not allowed in comments
- Most of the text inputs don't check for UTF-8 well-formedness. Haven't tested numeric character entities either, but those are suspicious
- Fake namespace declarations in attributes ($d->appendChild($d->createElement('foo:bar')); results in invalid XML, as foo namespace was never defined)

All these result in a $d->saveXML(); that is invalid XML, and probably some more.
 [2008-04-05 23:02 UTC] ezyang@php.net
One more: ]]> is not allowed in CDATA blocks.

I also suspect that the other XML extensions have bugs here.
 [2008-04-05 23:22 UTC] rrichards@php.net
You should read the specs more closely. Names are most certainly checked 
and a DOMException with an INVALID_CHARACTER_ERROR error is thrown.

Some of the others I need to look at because it is perfectly fine to 
create non well formed XML using DOM though it should error during 
serialization, so for those the bug would onl be in the saveXML routine. 
Other extensions it is not a bug because non-well formed XML support is 
required because the output when used in a larger context is well formed

 [2008-04-06 04:15 UTC] ezyang@php.net
You're right, that was my mistake.

I suppose that any of the places where the DOM specification could raise an exception would be the places where there needs to be error-checking.

Ought of curiosity, how come we don't use gdome2?
 [2008-04-06 14:06 UTC] geoffers+phpbugs at gmail dot com
While we should be throwing exceptions when strictErrorChecking is true, 
behaviour when it is false is undefined. I would suggest in that case to 
keep allowing everything.
 [2008-04-17 20:24 UTC] rrichards@php.net
This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.


 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Dec 12 02:01:27 2024 UTC