php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #71993 DOMDocument::loadHTML ignores anything after a null byte
Submitted: 2016-04-08 19:18 UTC Modified: 2016-04-13 07:29 UTC
From: raul at raulr dot net Assigned:
Status: Not a bug Package: DOM XML related
PHP Version: 7.0.5 OS: Linux
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: raul at raulr dot net
New email:
PHP Version: OS:

 

 [2016-04-08 19:18 UTC] raul at raulr dot net
Description:
------------
DOMDocument::loadHTML() stops processing the source string after a null byte (U+0000) without issuing any exception or warning.

I have tested that this happens in PHP versions 5.5.34, 5.6.20 and 7.0.5.

Test script:
---------------
<?php

$html = <<<EOD
<!DOCTYPE html>
<html>
<div>Pre NULL byte</div>
\0
<div>Post NULL byte</div>
</html>
EOD;

$dom = new \DOMDocument('1.0');
$dom->validateOnParse = true;
$dom->loadHTML($html);
echo $dom->saveHTML();

Expected result:
----------------
<!DOCTYPE html>
<html><body><div>Pre NULL byte</div>

<div>Post NULL byte</div>
</body></html>

Actual result:
--------------
<!DOCTYPE html>
<html><body><div>Pre NULL byte</div></body></html>

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2016-04-13 07:29 UTC] ab@php.net
-Status: Open +Status: Not a bug
 [2016-04-13 07:29 UTC] ab@php.net
Thanks for the report. The input comes as a binary safe string in PHP, then we reevaluate it using xmlStrlen(). But even without it, libxml2 will retain same behavior just silently stopping. Thus it is not a bug in PHP, this should be reevaluated once libxml2 has a better HTML5support. Please see also this couple of links

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=574104
https://mail.gnome.org/archives/xml/2008-August/msg00008.html

Thanks.
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Wed Feb 05 21:01:34 2025 UTC