php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #41041 Relax NG validation fails with DTD defined entities
Submitted: 2007-04-10 15:50 UTC Modified: 2007-04-10 17:00 UTC
From: bleathem at gmail dot com Assigned: rrichards (profile)
Status: Not a bug Package: DOM XML related
PHP Version: 5.2.1 OS: Max OS/X 10.4.9
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: bleathem at gmail dot com
New email:
PHP Version: OS:

 

 [2007-04-10 15:50 UTC] bleathem at gmail dot com
Description:
------------
I've created a xml file that uses a doctype to define html entities. I've created an accompanying Relax NG file to validate the file. If the xml element that contains the entity follows an Relax NG <interleave> block, the validation fails. I have demonstrated this in the accompanying source code. The variable $xml is validated against two Relax NG schemas, where the preceding elements are contained in an <interleave> block, and one where the are not. The validation fails in the <interleave> case.

I have tried the validation using an online validator (java based, uses jing), see:
http://hsivonen.iki.fi/validator/
so it is not the XML or the Relax NG, but rather the validator itself.

I have found other circumstances where the presence entities cause the validation to fail, I can provided these if they are necessary.

Reproduce code:
---------------
<?php

$xml = <<<EOF
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root[
  <!ENTITY mu "mu">
]>
<eec:field xmlns:eec="http://www.triumf.info/common/xml/eec"
          xmlns="http://www.w3.org/1999/xhtml">

    <eec:title>Isotope</eec:title>
    <eec:name>sec_isotope</eec:name>
    <eec:type>text</eec:type>
    <eec:value>&mu;</eec:value>
</eec:field>
EOF;

$rng = <<<EOF
<?xml version="1.0" encoding="UTF-8"?>

<grammar xmlns="http://relaxng.org/ns/structure/1.0"
         xmlns:eec="http://www.triumf.info/common/xml/eec">
         
  <start>
    <element name="eec:field">
      <interleave>
      <element name="eec:title"><text/></element>
      <element name="eec:name"><text/></element>
      <element name="eec:type"><text/></element>
      </interleave>
      <element name="eec:value"><text/></element>
    </element>
  </start>
    
</grammar>
EOF;

$rng2 = <<<EOF
<?xml version="1.0" encoding="UTF-8"?>

<grammar xmlns="http://relaxng.org/ns/structure/1.0"
         xmlns:eec="http://www.triumf.info/common/xml/eec">
         
  <start>
    <element name="eec:field">
      <element name="eec:title"><text/></element>
      <element name="eec:name"><text/></element>
      <element name="eec:type"><text/></element>
      <element name="eec:value"><text/></element>
    </element>
  </start>
    
</grammar>
EOF;

ini_set( 'track_errors', 1);
ini_set('error_reporting', E_ALL | E_STRICT);

$dom = new DOMDocument();
$dom->loadXML($xml, LIBXML_DTDLOAD|LIBXML_DTDATTR);

echo "<h3>1st Time</h3>";
if ($dom->relaxNGValidateSource($rng)) echo "Relax NG validated";
else echo $php_errormsg;

echo "<h3>2nd Time</h3>";
if ($dom->relaxNGValidateSource($rng2)) echo "Relax NG validated";
else echo $php_errormsg;
?> 

Expected result:
----------------
1st Time
Relax NG validated
2nd Time
Relax NG validated

Actual result:
--------------
1st Time
Element value has extra content: mu
2nd Time
Relax NG validated

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2007-04-10 16:11 UTC] rrichards@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

you need to substitute the entities LIBXML_NOENT when loading the xml into the DOM document.
 [2007-04-10 17:00 UTC] bleathem at gmail dot com
Thanks, this solved my problem exactly.

And sorry for wasting your time.  I did read through the PHP documentation extensively, I was however looking in the section on DOM, rather than libxml.

Believe me, the effort involved in submitting a bug report (installing the latest version, writing sample code, etc. etc.) makes it a last resort!
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Sat Oct 25 20:00:01 2025 UTC