|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #80679 if loaded document starts with text, some nodes will be lost
Submitted: 2021-01-28 10:51 UTC Modified: 2021-01-28 10:55 UTC
From: andrey at email dot dp dot ua Assigned: cmb (profile)
Status: Not a bug Package: DOM XML related
PHP Version: Irrelevant OS: Debian Linux
Private report: No CVE-ID: None
 [2021-01-28 10:51 UTC] andrey at email dot dp dot ua
If document starts with text, loading it by loadHTML will corrupt some element nodes:

1. The text at the beginning of the document will be placed inside <p></p> tags.

2. The text inside <p></p> tags will be placed after document declaration

3. <html> node in document will be lost but child nodes will be preserved

4. <head> node will be lost but child nodes will be preserved

5. if document has no head element but have another one, it will be included in <p></p> tags with starting text

The behaviour is different for some versions of libxml:

libxml 2.9.1:
starting text enclosed in <p></p> tags and <head> element's child nodes will be placed before <body> node.

libxml 2.9.4:
starting text enclosed in <p></p> tags and <head> element's child nodes will be included inside <body> node

Test script:

$html = "some text <!DOCTYPE html>
<title>Title of the document</title>

The content of the document......


echo "\n\nStarting with text\n";
echo "----------------------\n";

$dom = new \DOMDocument();
$dom->formatOutput  = true;

echo $dom->saveHTML();

$html = str_replace('head>', 'not_head>', $html);

echo "\n\nStarting with text and no head node\n";
echo "----------------------\n";

$dom = new \DOMDocument();
$dom->formatOutput  = true;

echo $dom->saveHTML();

Expected result:
Starting with text
some text <!DOCTYPE html>
<title>Title of the document</title>

The content of the document......


Starting with text and no head node
some text <!DOCTYPE html>
<title>Title of the document</title>

The content of the document......


Also some kind of expected result:

Starting with text
<!DOCTYPE html>
some text
<title>Title of the document</title>

The content of the document......


Starting with text and no head node
<!DOCTYPE html>
some text
<title>Title of the document</title>

The content of the document......


Actual result:
Starting with text
<!DOCTYPE html>
<p>some text 

<title>Title of the document</title>
The content of the document......


Starting with text and no head node
<!DOCTYPE html>
<p>some text 

<not_head><title>Title of the document</title></not_head></p>
The content of the document......



Pull Requests


AllCommentsChangesGit/SVN commitsRelated reports
 [2021-01-28 10:55 UTC]
-Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb
 [2021-01-28 10:55 UTC]
> The behaviour is different for some versions of libxml:

So this is apparently an upstream (i.e. libxml2) issue.  Consider
to report it there.
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Wed Feb 05 17:01:30 2025 UTC