|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
[2021-01-28 10:55 UTC] cmb@php.net
-Status: Open
+Status: Not a bug
-Assigned To:
+Assigned To: cmb
[2021-01-28 10:55 UTC] cmb@php.net
|
|||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Sun Nov 02 09:00:01 2025 UTC |
Description: ------------ If document starts with text, loading it by loadHTML will corrupt some element nodes: 1. The text at the beginning of the document will be placed inside <p></p> tags. 2. The text inside <p></p> tags will be placed after document declaration 3. <html> node in document will be lost but child nodes will be preserved 4. <head> node will be lost but child nodes will be preserved 5. if document has no head element but have another one, it will be included in <p></p> tags with starting text The behaviour is different for some versions of libxml: libxml 2.9.1: starting text enclosed in <p></p> tags and <head> element's child nodes will be placed before <body> node. libxml 2.9.4: starting text enclosed in <p></p> tags and <head> element's child nodes will be included inside <body> node Test script: --------------- <?php $html = "some text <!DOCTYPE html> <html> <head> <title>Title of the document</title> </head> <body> The content of the document...... </body> </html>"; echo "\n\nStarting with text\n"; echo "----------------------\n"; $dom = new \DOMDocument(); $dom->formatOutput = true; $dom->loadHTML($html); echo $dom->saveHTML(); $html = str_replace('head>', 'not_head>', $html); echo "\n\nStarting with text and no head node\n"; echo "----------------------\n"; $dom = new \DOMDocument(); $dom->formatOutput = true; $dom->loadHTML($html); echo $dom->saveHTML(); Expected result: ---------------- Starting with text ---------------------- some text <!DOCTYPE html> <html> <head> <title>Title of the document</title> </head> <body> The content of the document...... </body> </html> Starting with text and no head node ---------------------- some text <!DOCTYPE html> <html> <not_head> <title>Title of the document</title> </not_head> <body> The content of the document...... </body> </html> Also some kind of expected result: ================================== Starting with text ---------------------- <!DOCTYPE html> some text <html> <head> <title>Title of the document</title> </head> <body> The content of the document...... </body> </html> Starting with text and no head node ---------------------- <!DOCTYPE html> some text <html> <not_head> <title>Title of the document</title> </not_head> <body> The content of the document...... </body> </html> Actual result: -------------- Starting with text ---------------------- <!DOCTYPE html> <html><body> <p>some text </p> <title>Title of the document</title> The content of the document...... </body></html> Starting with text and no head node ---------------------- <!DOCTYPE html> <html><body> <p>some text <not_head><title>Title of the document</title></not_head></p> The content of the document...... </body></html>