php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #41041 Relax NG validation fails with DTD defined entities
Submitted: 2007-04-10 15:50 UTC Modified: 2007-04-10 17:00 UTC
From: bleathem at gmail dot com Assigned: rrichards (profile)
Status: Not a bug Package: DOM XML related
PHP Version: 5.2.1 OS: Max OS/X 10.4.9
Private report: No CVE-ID: None
 [2007-04-10 15:50 UTC] bleathem at gmail dot com
Description:
------------
I've created a xml file that uses a doctype to define html entities. I've created an accompanying Relax NG file to validate the file. If the xml element that contains the entity follows an Relax NG <interleave> block, the validation fails. I have demonstrated this in the accompanying source code. The variable $xml is validated against two Relax NG schemas, where the preceding elements are contained in an <interleave> block, and one where the are not. The validation fails in the <interleave> case.

I have tried the validation using an online validator (java based, uses jing), see:
http://hsivonen.iki.fi/validator/
so it is not the XML or the Relax NG, but rather the validator itself.

I have found other circumstances where the presence entities cause the validation to fail, I can provided these if they are necessary.

Reproduce code:
---------------
<?php

$xml = <<<EOF
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root[
  <!ENTITY mu "mu">
]>
<eec:field xmlns:eec="http://www.triumf.info/common/xml/eec"
          xmlns="http://www.w3.org/1999/xhtml">

    <eec:title>Isotope</eec:title>
    <eec:name>sec_isotope</eec:name>
    <eec:type>text</eec:type>
    <eec:value>&mu;</eec:value>
</eec:field>
EOF;

$rng = <<<EOF
<?xml version="1.0" encoding="UTF-8"?>

<grammar xmlns="http://relaxng.org/ns/structure/1.0"
         xmlns:eec="http://www.triumf.info/common/xml/eec">
         
  <start>
    <element name="eec:field">
      <interleave>
      <element name="eec:title"><text/></element>
      <element name="eec:name"><text/></element>
      <element name="eec:type"><text/></element>
      </interleave>
      <element name="eec:value"><text/></element>
    </element>
  </start>
    
</grammar>
EOF;

$rng2 = <<<EOF
<?xml version="1.0" encoding="UTF-8"?>

<grammar xmlns="http://relaxng.org/ns/structure/1.0"
         xmlns:eec="http://www.triumf.info/common/xml/eec">
         
  <start>
    <element name="eec:field">
      <element name="eec:title"><text/></element>
      <element name="eec:name"><text/></element>
      <element name="eec:type"><text/></element>
      <element name="eec:value"><text/></element>
    </element>
  </start>
    
</grammar>
EOF;

ini_set( 'track_errors', 1);
ini_set('error_reporting', E_ALL | E_STRICT);

$dom = new DOMDocument();
$dom->loadXML($xml, LIBXML_DTDLOAD|LIBXML_DTDATTR);

echo "<h3>1st Time</h3>";
if ($dom->relaxNGValidateSource($rng)) echo "Relax NG validated";
else echo $php_errormsg;

echo "<h3>2nd Time</h3>";
if ($dom->relaxNGValidateSource($rng2)) echo "Relax NG validated";
else echo $php_errormsg;
?> 

Expected result:
----------------
1st Time
Relax NG validated
2nd Time
Relax NG validated

Actual result:
--------------
1st Time
Element value has extra content: mu
2nd Time
Relax NG validated

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2007-04-10 16:11 UTC] rrichards@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

you need to substitute the entities LIBXML_NOENT when loading the xml into the DOM document.
 [2007-04-10 17:00 UTC] bleathem at gmail dot com
Thanks, this solved my problem exactly.

And sorry for wasting your time.  I did read through the PHP documentation extensively, I was however looking in the section on DOM, rather than libxml.

Believe me, the effort involved in submitting a bug report (installing the latest version, writing sample code, etc. etc.) makes it a last resort!
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Sat Oct 25 23:00:01 2025 UTC