|   | php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login | 
| 
  [2012-09-29 19:56 UTC] vl dot homutov at gmail dot com
 Description:
------------
PHP's xml_parse() ignores external DTD specified in the
XML file and thus can't parse the file if it has
unknown entities (defined in the DTD mentioned).
Test script:
---------------
#!/usr/bin/php
<?php
$xml_ext_dtd=<<<EOXML
<?xml version="1.0"?>
<!DOCTYPE mytag SYSTEM "./mytag.dtd">
<mytag><elem>one</elem><elem>two</elem><elem>&custom;</elem>/mytag>
EOXML;
$xml_int_dtd=<<<EOXML
<?xml version="1.0"?>
<!DOCTYPE mytag
[
<!ENTITY custom SYSTEM "file.txt">
]>
<mytag><elem>one</elem><elem>two</elem><elem>&custom;</elem>/mytag>
EOXML;
function externalEntityHandler($parser, $name, $base, $systemId, $publicId)
{
	echo "PROCESS EXTERNAL REFERENCE(file=$systemId)\n";
	return true;
}
function characterDataHandler($parser, $data)
{
	echo "CDATA found: '$data'\n";
}
function xerr($parser)
{
	$out = "XML parser error:";
	$out.=xml_error_string(xml_get_error_code($parser));
	$out.="\n";
	return $out;
}
echo "This works OK - parse xml1:\n$xml_int_dtd\n";
echo "---------------------------------------\n";
$xml_parser = xml_parser_create();
xml_set_character_data_handler($xml_parser, "characterDataHandler");
xml_set_external_entity_ref_handler($xml_parser, "externalEntityHandler");
xml_parse($xml_parser, $xml_int_dtd) or die(xerr($xml_parser));
echo "\nThis FAILS - parse xml2:\n$xml_ext_dtd\n";
echo "---------------------------------------\n";
$xml_parser = xml_parser_create();
xml_set_character_data_handler($xml_parser, "characterDataHandler");
xml_set_external_entity_ref_handler($xml_parser, "externalEntityHandler");
$rv = xml_parse($xml_parser, $xml_ext_dtd);
if (!$rv) echo xerr($xml_parser);
echo "file 'mytag.dtd' is:\n".file_get_contents("./mytag.dtd");
?>
Expected result:
----------------
This works OK - parse xml1:
<?xml version="1.0"?>
<!DOCTYPE mytag
[
<!ENTITY custom SYSTEM "file.txt">
]>
<mytag><elem>one</elem><elem>two</elem><elem>&custom;</elem>/mytag>
---------------------------------------
CDATA found: 'one'
CDATA found: 'two'
PROCESS EXTERNAL REFERENCE(file=file.txt)
Actual result:
--------------
This FAILS - parse xml2:
<?xml version="1.0"?>
<!DOCTYPE mytag SYSTEM "./mytag.dtd">
<mytag><elem>one</elem><elem>two</elem><elem>&custom;</elem>/mytag>
---------------------------------------
CDATA found: 'one'
CDATA found: 'two'
XML parser error:Undeclared entity warning
file 'mytag.dtd' is:
<!ENTITY custom SYSTEM "file.txt">
PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits             | |||||||||||||||||||||||||||
|  Copyright © 2001-2025 The PHP Group All rights reserved. | Last updated: Fri Oct 31 16:00:01 2025 UTC | 
it would be nice to document all the limitations that this parser has. It's a bit pity to find out that despite having xml_set_external_entity_ref_handler() it doesn't work as expected. Another issue I've found recently is that newlines are not preserved in attributes. i.e. <some_tag id="Some aligned long text here like long title" more="more_attrs"> will read as atrrs[id]="Some aligned long text here like..." The workaround is to parse manually external DTD and inject file contents it points to into parsed file, so all entites get loaded. ugly, yes, but works for me.