php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #48080 Add support for forcing DOM to validate a DOMDocument with a DTD
Submitted: 2009-04-26 17:17 UTC Modified: 2011-10-29 21:36 UTC
Votes:14
Avg. Score:4.6 ± 0.7
Reproduced:13 of 13 (100.0%)
Same Version:6 (46.2%)
Same OS:3 (23.1%)
From: jose dot rob dot jr at gmail dot com Assigned:
Status: Open Package: DOM XML related
PHP Version: 5.2.9 OS:
Private report: No CVE-ID:
Have you experienced this issue?
Rate the importance of this bug to you:

 [2009-04-26 17:17 UTC] jose dot rob dot jr at gmail dot com
Description:
------------
I need to validate XML files before loading them, then I created a DTD and hosted it.

With python I can distribute the DTD file with the program and validate the XML file locally.
A python example:
---
from lxml import etree
from StringIO import StringIO

xmlstart="""<?xml version="1.0" encoding="utf8"?>
<!DOCTYPE example PUBLIC '-//Example//Example DTD' 'http://example.com/mydtd.dtd'>"""

xmlok=xmlstart+"<example>The XML file</example>";
xmlinvalid=xmlstart+"<example><a>test</a>The XML file</example>";

dtddata="<!ELEMENT example (#PCDATA) >";

f=StringIO(dtddata);
dtd=etree.DTD(f);

print "Valid XML:";
xml1=etree.XML(xmlok);
validation=dtd.validate(xml1);
print validation;
print dtd.error_log.filter_from_errors();

print "Invalid XML:";
xml2=etree.XML(xmlinvalid);
validation=dtd.validate(xml2);
print validation;
print dtd.error_log.filter_from_errors();
----
The only way I find to port this stript is using DOMDocument::validate() but this method will get the DTD from http://example.com/mydtd.dtd and be slower, generate traffic, and fail when example.com is off-line...

I suggest adding an attribute like DOMDocument::validate($source) where $source is a string with DTD source to avoid situations like this...

Reproduce code:
---------------
<?php
$xmlstart=<<<XML
<?xml version="1.0" encoding="utf8"?>
<!DOCTYPE example PUBLIC '-//Example//Example DTD' 'http://example.com/mydtd.dtd'>
XML;

$xmlok=$xmlstart."<example>The XML file</example>";
$xmlinvalid=$xmlstart."<example><a>test</a>The XML file</example>";
$dtddata="<!ELEMENT example (#PCDATA) >";

print "<h1>Valid XML:</h1>";
$xml1=DOMDocument::loadXML($xmlok);
$validation=(int)$xml1->validate($dtddata); //Example that would work
print "<p><b>$validation</b></p>";
print "<h1>Invalid XML:</h1>";
$xml1=DOMDocument::loadXML($xmlinvalid);
$validation=(int)$xml1->validate($dtddata); //Example that would work
print "<p><b>$validation</b></p>";
?>

Expected result:
----------------
Valid XML:

1

Invalid XML:

Warning: DOMDocument::validate() [function.DOMDocument-validate]: Element example was declared #PCDATA but contains non text nodes in /script/path/xml.php on line 19

Warning: DOMDocument::validate() [function.DOMDocument-validate]: No declaration for element a in /script/path/xml.php on line 19

0

Actual result:
--------------
When no argument is passed to validate and DTD server is off-line:

Valid XML:

Warning: DOMDocument::validate(http://example.com/mydtd.dtd) [function.DOMDocument-validate]: failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found in /script/path/xml.php on line 14

Warning: DOMDocument::validate() [function.DOMDocument-validate]: I/O warning : failed to load external entity "http://example.com/mydtd.dtd" in /script/path/xml.php on line 14

Warning: DOMDocument::validate() [function.DOMDocument-validate]: Could not load the external subset "http://example.com/mydtd.dtd" in /script/path/xml.php on line 14

0

Invalid XML:

Warning: DOMDocument::validate(http://example.com/mydtd.dtd) [function.DOMDocument-validate]: failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found in /script/path/xml.php on line 19

Warning: DOMDocument::validate() [function.DOMDocument-validate]: I/O warning : failed to load external entity "http://example.com/mydtd.dtd" in /script/path/xml.php on line 19

Warning: DOMDocument::validate() [function.DOMDocument-validate]: Could not load the external subset "http://example.com/mydtd.dtd" in /script/path/xml.php on line 19

0

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2010-10-31 12:49 UTC] php at example dot com
It should also be noted that this affects any DOMDocuments using the standard XHTML SystemIDs. The W3C decided to block all requests to their URIs. See http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic
 [2011-01-02 02:37 UTC] jani@php.net
-Package: Feature/Change Request +Package: DOM XML related
 [2011-08-29 05:08 UTC] cataphract@php.net
Added libxml_set_external_entity_loader() in PHP 5.4/trunk, which also solves this problem.
 [2011-09-08 09:49 UTC] bjori@php.net
-Assigned To: +Assigned To: cataphract
 [2011-09-08 09:49 UTC] bjori@php.net
Gustavo; So this is fixed?
 [2011-09-08 09:55 UTC] cataphract@php.net
The particular solution enunciated in this request for handling the external external DTD is not implemented, but you can now intercept the loading of such external entity with the external entity loader. See https://svn.php.net/viewvc/php/php-src/trunk/ext/libxml/tests/libxml_set_external_entity_loader_basic.phpt?revision=315672&view=markup
 [2011-10-29 21:35 UTC] cataphract@php.net
-Status: Assigned +Status: Open
 [2011-10-29 21:36 UTC] cataphract@php.net
-Status: Assigned +Status: Open -Assigned To: cataphract +Assigned To:
 
PHP Copyright © 2001-2014 The PHP Group
All rights reserved.
Last updated: Fri Apr 18 10:03:03 2014 UTC