php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #28628 PHP PI problem with dom->loadHTML
Submitted: 2004-06-04 00:46 UTC Modified: 2004-06-07 09:41 UTC
From: bart at mediawave dot nl Assigned:
Status: Not a bug Package: DOM XML related
PHP Version: 5.0.0RC2 OS: WinXP
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: bart at mediawave dot nl
New email:
PHP Version: OS:

 

 [2004-06-04 00:46 UTC] bart at mediawave dot nl
Description:
------------
When loading a W3C valid HTML 4.01 html string with DOMDocument->loadHTML, DOM has trouble with php Processing Instructions (<?php ... ?>).

html string Is Valid HTML 4.01 Transitional:

http://validator.w3.org/check?uri=http%3A%2F%2Fwww.mediawave.nl%2Fhtmlfile.htm&charset=%28detect+automatically%29&doctype=%28detect+automatically%29&ss=1&verbose=1

Reproduce code:
---------------
<?php

$html = '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>Untitled Document</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>

<body>
<p><?php echo "hello? world? are you there? Can you see me? :(" ?></p>
</body>
</html>';

$dom = new DomDocument;
$dom->loadHTML($html);
echo '<pre>', htmlspecialchars($dom->saveHTML()), '</pre>';

?>

Expected result:
----------------
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>Untitled Document</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>
<body><p><?php echo "hello? world? are you there? Can you see me? :(" ?></p></body>
</html>

Actual result:
--------------
Warning: DOMDocument::loadHTML() [function.loadHTML]: htmlParseStartTag: invalid element name in Entity, line: 9 in D:\Inetpub\wwwroot\test2.php on line 24

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>Untitled Document</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>
<body><p></p></body>
</html>

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2004-06-07 09:41 UTC] chregu@php.net
Sorry, but your problem does not imply a bug in PHP itself.  For a
list of more appropriate places to ask for help using PHP, please
visit http://www.php.net/support.php as this bug system is not the
appropriate forum for asking support questions. 

Thank you for your interest in PHP.

HTML4 does not know anything about processing 
instructions. Therefore the HTML parser of libxml2 
chokes on that (and PHP can't change that). Make XHTML 
out of it and use the XML parser (with loadXML() ), then 
it works

 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Apr 19 21:01:30 2024 UTC