php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #51484 '--' incorrectly allowed inside comments
Submitted: 2010-04-05 23:48 UTC Modified: 2010-04-07 14:25 UTC
From: ifland at gmail dot com Assigned:
Status: Not a bug Package: XML related
PHP Version: 5.2.13 OS: *
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If this is not your bug, you can add a comment by following this link.
If this is your bug, but you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: ifland at gmail dot com
New email:
PHP Version: OS:

 

 [2010-04-05 23:48 UTC] ifland at gmail dot com
Description:
------------
According to the XML spec (see http://www.w3.org/TR/2008/REC-xml-20081126/#sec-comments ), comments in XML are not allowed to contain two hyphens in a row, which can occasionally surface when processing poorly-formed HTML documents as input.

No suggestion is given in the spec for how to deal with the situation - we can't turn the hyphens into entities (those aren't allowed in comments either), but Firefox and possibly other browsers will fail to parse XML documents with the double hyphen.



Test script:
---------------
<?php
$doc = new DOMDocument();
$doc->loadHTML("<html><body><!--comment <!--sketchy commented comment--></body>");
header("Content-type: text/plain");
echo $doc->saveXML();
?>

Expected result:
----------------
Either a catchable error or something like this:

<?xml version="1.0" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><!--comment <!- -commented comment--></body></html>


Actual result:
--------------
<?xml version="1.0" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><!--comment <!--commented comment--></body></html>


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2010-04-07 14:25 UTC] iliaa@php.net
-Status: Open +Status: Bogus
 [2010-04-07 14:25 UTC] iliaa@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

The handling of this is done by libxml2 and not PHP, also you are using loadHTML() 
which is designed to handle non-well-formed HTML.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Apr 19 07:01:27 2024 UTC