php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #67350 An XML DTD with an external parameter failes to include the subset
Submitted: 2014-05-28 02:10 UTC Modified: 2014-06-05 02:09 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:0 (0.0%)
Same OS:1 (100.0%)
From: rjlaustin at teksavvy dot com Assigned:
Status: Open Package: DOM XML related
PHP Version: 5.4.28 OS: Linux
Private report: No CVE-ID: None
Have you experienced this issue?
Rate the importance of this bug to you:

 [2014-05-28 02:10 UTC] rjlaustin at teksavvy dot com
Description:
------------
This problem occurs on two Linux systems where my code is used but over which I have no control and limited information. One is on version 5.4.28, built May 5 2014, the other is on 5.5.3-1, built April 4 2014. The problem did not occur on whatever versions they were running before last week.

The validation of XML against a DTD fails, complaining that one or more entities has no declaration. The message is:
DOMDocument::validate(): No declaration for element xxxx

All the elements complained of are in subsets that are (or rather were, because I have had to change them all) included in the outer DTD by means of the external parameter notation, e.g.
<!ENTITY % theatre SYSTEM "theatre.dtd">
later followed by
%theatre;

It appears that the contents of (in this case) "theatre.dtd" are simply lost.
Putting the contents of the file inline in the outer DTD is a successful workaround.

I have tested the latest Mac version of PHP available from XAMPP (5.5.11 built April 9) and it does NOT have this bug.

Since I have to support my code on any system, even if this problem gets fixed I will never be able to use this feature again because I will never know when a all the potential users have moved up to the fixed release. It's an annoying loss of a convenient feature but it isn't a show-stopper.

Test script:
---------------
I'm afraid I am not in a position to ask the people who got the problem to restore their code so as to demonstrate it now. And I haven't got a system of my own that displays the problem. A fix will be of no use to me now, but if there is interest in getting it fixed and you need more documentation I could probably work up a little demonstration script.


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2014-06-01 22:59 UTC] ant-dani-2 at hotmail dot com
+1

Same thing happens on my ubuntu 14.04 server with PHP Version 5.5.9-1ubuntu4 (built Apr 9 2014).
Exactly same situation when including an external DTD.
On my development WAMP server with PHP Version 5.5.12 (built Apr 30 2014), this bug does not occur.

Not sure if the bug was fixed as of PHP Version 5.5.10
 [2014-06-02 15:38 UTC] ant-dani-2 at hotmail dot com
So I found out this is not really a bug, but the result of a security patch agains XXE Injection.
If you're just loading a local trusted xml file, don't forget to explicitly tell libxml2 to load the external entities:
$xml = new DOMDocument();
$xml->load("your.xml", LIBXML_DTDLOAD | LIBXML_DTDATTR | LIBXML_DTDVALID);

This will "fix" your bug. I hope it helps you.
 [2014-06-02 17:24 UTC] requinix@php.net
-Status: Open +Status: Feedback
 [2014-06-02 17:24 UTC] requinix@php.net
@rjlaustin, please try @ant-dani-2's suggestion.
 [2014-06-03 01:45 UTC] rjlaustin at teksavvy dot com
-Status: Feedback +Status: Open
 [2014-06-03 01:45 UTC] rjlaustin at teksavvy dot com
It is certainly true that I have not set the option 'LIBXML_DTDLOAD' on my loadXML() call, and the documentation for that option ("Load the external subset") could refer to what is now sometimes not working - though I couldn't be sure from those four words alone.

However, to try @ant-dani-2's suggestion I would need a machine that displays the problem and I haven't got one. I have an elderly Windows (2000) system (running PHP Version 5.3.1 from November 2009) that probably won't run anything new enough to have the security patch agains XXE Injection that is said to be responsible; and I have a Mac laptop that is now running a very new XAMPP package - but the problem doesn't happen there.

It seems odd to me that a security patch would affect some systems in this way and not others - or perhaps the security patch hasn't been applied to all systems? But then, why not? Is security only a problem on Linux systems? And of course the fact that my code has worked for a long time (and still works in places) needs some explaining too - was that itself a bug, or was the LIBXML option setting never really implemented?

I would have thought that all PHP systems should behave alike (unless there is a pretty good reason for them not doing so). And if that is so, I think there is a bug here though it may not be what I have complained of.

I'm grateful for the suggestion and I'm sorry I cannot actually help by testing it out. If I find myself in position to test and confirm the suggestion I will certainly report the result.
 [2014-06-03 10:05 UTC] ant-dani-2 at hotmail dot com
Well, the problem on linux is allowing something like this
<!ENTITY % something SYSTEM "/etc/shadow">
let pass through the parser, specially if the web server is running as root.
 [2014-06-03 15:10 UTC] rjlaustin at teksavvy dot com
Thanks. I should have looked up XXE (which meant nothing to me) which I have now done, and I have read what the OWASP site has to say about the vulnerability.

Point 1. What I observe is this:
The PHP version 5.5.11 (April 9 2014) for the Mac, as distributed by XAMPP, allows access to an external subset within a DTD WHETHER OR NOT the option LIBXML_DTDLOAD is used on the PHP function call loadXML().

What I don't know is: is that good behaviour or bad behaviour?

If the purpose of the LIBXML_DTDLOAD option is to allow external file access, then I would have expected that when that option is absent external file access should NOT be allowed. But that may not be what LIBXML_DTDLOAD is supposed to do.

Point 2. What I also observed was this:
On the Linux systems that experienced the problem I reported, only the subset of the DTD was not loaded; but the major file, the one named in the DOCTYPE element, was still being happily accessed. The OWASP article on this XXE vulnerability deals exclusively with that access; it doesn't mention the subset access that was what broke for me.

If the security patch that you identify as resulting in the behaviour on Linux that broke my code is actually supposed to prevent access to external files named in the DOCTYPE element, it is NOT working. It is allowing those files, but blocking (silently) only their subsets. Blocking subsets while still allowing the main file to be read doesn't make sense to me.

Do you, by any chance, have evidence that the security patch of which you write was added to the Linux build at the time when this bug (change in behaviour, if you prefer) first occurred?
 [2014-06-04 13:06 UTC] ant-dani-2 at hotmail dot com
If you look at http://uk.php.net/libxml.constants, you'll see that LIBXML_DTDLOAD is described as "Load the external subset".
The "subset" is (as far as I know) any entity with an external DTD. DOCTYPE is the set.

And I can say I do have evidence.
I was trying to embed the XHTML 1.0 Transitional DTD, with the intent of having several (unique id) HTML nodes that define different webpage layouts.
This would (and did) allow me to focus solely on the code behaviour, without my eyes bleeding from seeing mixed PHP/HTML.

Everything was working fine on windows, but validation always failed on Linux.
Looked on Apache's error log, and saw warnings about "declaration not found for element ..."

The 'head' of my layout.xml is:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE layout [
<!ELEMENT layout (html?)>
<!ENTITY % html SYSTEM "xhtml/xhtml1-transitional.dtd">
%html;
]>
 [2014-06-04 13:43 UTC] rjlaustin at teksavvy dot com
Thank you. I don't really want to bother you further but I don't think you quite understood my question about having evidence.

What I meant was: did you
(a) not have this problem on one version of Linux (say, version x)
(b) find that you DID get this problem on a newer version (say, version y)
(c) identify the security patch as being in version y and not in version x?

I cannot obtain this sort of evidence because I don't have the Linux system necessary to create it.

Do you have an ID for the security patch you have been talking about?
 [2014-06-04 19:24 UTC] ant-dani-2 at hotmail dot com
Ok, I get what you mean now.
1. I'm only using ubuntu 14.04 x64
2. I found out about the security patch through libxml2's page on ubuntu launchpad (took some time to correlate the events).
3. For all LTS versions of ubuntu, libxml2 received the patch on 2014-05-15
Source: https://launchpad.net/ubuntu/+source/libxml2
 [2014-06-05 02:09 UTC] rjlaustin at teksavvy dot com
Thanks once more. I looked at the libxml2 patch and it certainly seems as if it might be responsible for the change, but I can't say I understand what this code is exactly testing, i.e. in what circumstances it will return without doing anything.
++                    if ((entity->etype == XML_EXTERNAL_PARAMETER_ENTITY) &&
++		        ((ctxt->options & XML_PARSE_NOENT) == 0) &&
++			((ctxt->options & XML_PARSE_DTDVALID) == 0) &&
++			(ctxt->validate == 0))
++			return;

Anyway, you seem to have found very strong evidence that the problem is located in libxml2 rather than PHP itself.

So was the reason for my PHP code breaking on the two Linux systems, the patch to libxml2? The date of the patch seems to be 8 May and you say it was included in systems built on the 15 May. The first of my two users to report the problem, reported it on 20 May but their PHPINFO display says the System is:
Linux gilly 3.11.0-19-generic #33-Ubuntu SMP Tue Mar 11 18:48:34 UTC 2014 x86_64
with a Build Date of:
Apr 4 2014 01:07:05
Could that patch be in libxml2 even though the build date and the system date are earlier than the patch date? It seems rather unlikely but I suppose it could be ...

So if anyone wants to close this now, as far as I'm concerned they can.
I wish the documentation on the 'LIBXML_DTDLOAD' option was just a bit clearer.
And I wish PHP behaved the same on all platforms so we could at least be sure which was the right behaviour.

But for anyone else who runs into the problem, obviously their first move should be to add the 'LIBXML_DTDLOAD' option to their load() or loadXML() call.

Thanks again for the explanations.
 
PHP Copyright © 2001-2020 The PHP Group
All rights reserved.
Last updated: Tue Feb 25 05:01:25 2020 UTC