php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #48080 Add support for forcing DOM to validate a DOMDocument with a DTD
Submitted: 2009-04-26 17:17 UTC Modified: 2021-07-18 04:22 UTC
Votes:21
Avg. Score:4.6 ± 0.7
Reproduced:20 of 20 (100.0%)
Same Version:8 (40.0%)
Same OS:3 (15.0%)
From: jose dot rob dot jr at gmail dot com Assigned: cmb (profile)
Status: No Feedback Package: DOM XML related
PHP Version: 5.2.9 OS:
Private report: No CVE-ID: None
 [2009-04-26 17:17 UTC] jose dot rob dot jr at gmail dot com
Description:
------------
I need to validate XML files before loading them, then I created a DTD and hosted it.

With python I can distribute the DTD file with the program and validate the XML file locally.
A python example:
---
from lxml import etree
from StringIO import StringIO

xmlstart="""<?xml version="1.0" encoding="utf8"?>
<!DOCTYPE example PUBLIC '-//Example//Example DTD' 'http://example.com/mydtd.dtd'>"""

xmlok=xmlstart+"<example>The XML file</example>";
xmlinvalid=xmlstart+"<example><a>test</a>The XML file</example>";

dtddata="<!ELEMENT example (#PCDATA) >";

f=StringIO(dtddata);
dtd=etree.DTD(f);

print "Valid XML:";
xml1=etree.XML(xmlok);
validation=dtd.validate(xml1);
print validation;
print dtd.error_log.filter_from_errors();

print "Invalid XML:";
xml2=etree.XML(xmlinvalid);
validation=dtd.validate(xml2);
print validation;
print dtd.error_log.filter_from_errors();
----
The only way I find to port this stript is using DOMDocument::validate() but this method will get the DTD from http://example.com/mydtd.dtd and be slower, generate traffic, and fail when example.com is off-line...

I suggest adding an attribute like DOMDocument::validate($source) where $source is a string with DTD source to avoid situations like this...

Reproduce code:
---------------
<?php
$xmlstart=<<<XML
<?xml version="1.0" encoding="utf8"?>
<!DOCTYPE example PUBLIC '-//Example//Example DTD' 'http://example.com/mydtd.dtd'>
XML;

$xmlok=$xmlstart."<example>The XML file</example>";
$xmlinvalid=$xmlstart."<example><a>test</a>The XML file</example>";
$dtddata="<!ELEMENT example (#PCDATA) >";

print "<h1>Valid XML:</h1>";
$xml1=DOMDocument::loadXML($xmlok);
$validation=(int)$xml1->validate($dtddata); //Example that would work
print "<p><b>$validation</b></p>";
print "<h1>Invalid XML:</h1>";
$xml1=DOMDocument::loadXML($xmlinvalid);
$validation=(int)$xml1->validate($dtddata); //Example that would work
print "<p><b>$validation</b></p>";
?>

Expected result:
----------------
Valid XML:

1

Invalid XML:

Warning: DOMDocument::validate() [function.DOMDocument-validate]: Element example was declared #PCDATA but contains non text nodes in /script/path/xml.php on line 19

Warning: DOMDocument::validate() [function.DOMDocument-validate]: No declaration for element a in /script/path/xml.php on line 19

0

Actual result:
--------------
When no argument is passed to validate and DTD server is off-line:

Valid XML:

Warning: DOMDocument::validate(http://example.com/mydtd.dtd) [function.DOMDocument-validate]: failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found in /script/path/xml.php on line 14

Warning: DOMDocument::validate() [function.DOMDocument-validate]: I/O warning : failed to load external entity "http://example.com/mydtd.dtd" in /script/path/xml.php on line 14

Warning: DOMDocument::validate() [function.DOMDocument-validate]: Could not load the external subset "http://example.com/mydtd.dtd" in /script/path/xml.php on line 14

0

Invalid XML:

Warning: DOMDocument::validate(http://example.com/mydtd.dtd) [function.DOMDocument-validate]: failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found in /script/path/xml.php on line 19

Warning: DOMDocument::validate() [function.DOMDocument-validate]: I/O warning : failed to load external entity "http://example.com/mydtd.dtd" in /script/path/xml.php on line 19

Warning: DOMDocument::validate() [function.DOMDocument-validate]: Could not load the external subset "http://example.com/mydtd.dtd" in /script/path/xml.php on line 19

0

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2010-10-31 12:49 UTC] php at example dot com
It should also be noted that this affects any DOMDocuments using the standard XHTML SystemIDs. The W3C decided to block all requests to their URIs. See http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic
 [2011-01-02 02:37 UTC] jani@php.net
-Package: Feature/Change Request +Package: DOM XML related
 [2011-08-29 05:08 UTC] cataphract@php.net
Added libxml_set_external_entity_loader() in PHP 5.4/trunk, which also solves this problem.
 [2011-09-08 09:49 UTC] bjori@php.net
-Assigned To: +Assigned To: cataphract
 [2011-09-08 09:49 UTC] bjori@php.net
Gustavo; So this is fixed?
 [2011-09-08 09:55 UTC] cataphract@php.net
The particular solution enunciated in this request for handling the external external DTD is not implemented, but you can now intercept the loading of such external entity with the external entity loader. See https://svn.php.net/viewvc/php/php-src/trunk/ext/libxml/tests/libxml_set_external_entity_loader_basic.phpt?revision=315672&view=markup
 [2011-10-29 21:35 UTC] cataphract@php.net
-Status: Assigned +Status: Open
 [2011-10-29 21:36 UTC] cataphract@php.net
-Status: Assigned +Status: Open -Assigned To: cataphract +Assigned To:
 [2021-07-07 13:43 UTC] cmb@php.net
-Status: Open +Status: Feedback -Assigned To: +Assigned To: cmb
 [2021-07-07 13:43 UTC] cmb@php.net
No more comments for more than ten years.  Can I assume that this
request is sufficiently resolved with the introduction of
libxml_set_external_entity_loader(), or not?
 [2021-07-18 04:22 UTC] php-bugs at lists dot php dot net
No feedback was provided. The bug is being suspended because
we assume that you are no longer experiencing the problem.
If this is not the case and you are able to provide the
information that was requested earlier, please do so and
change the status of the bug back to "Re-Opened". Thank you.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Wed Dec 04 12:01:30 2024 UTC