php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #55374 DOMDocument::LoadHTMLFile fails with %xx sequences in filename.
Submitted: 2011-08-06 06:37 UTC Modified: 2021-04-08 14:35 UTC
Votes:11
Avg. Score:4.2 ± 0.8
Reproduced:9 of 10 (90.0%)
Same Version:2 (22.2%)
Same OS:4 (44.4%)
From: keithm at aoeex dot com Assigned:
Status: Verified Package: XML related
PHP Version: 5.4.0alpha3 OS: Linux
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: keithm at aoeex dot com
New email:
PHP Version: OS:

 

 [2011-08-06 06:37 UTC] keithm at aoeex dot com
Description:
------------
DOMDocument::LoadHTMLFile appears to urldecode it's argument, which causes 
problems when attempting to load a file containing a %xx sequence.

This issue was brought up on ##php in freenode when someone was attempting to load 
a file named 'Linux_Files%2Fetc%2Fbash.bashrc.html'.  Suggested work around was to 
use LoadHTML + file_get_contents instead.

There was a small debate over whether this is a bug, or just a documentation 
problem (perhaps LoadHTMLFile expects a URL).

DOMDocument::Load() is also affected.

Test script:
---------------
Contents of 'Linux_Files%2Fetc%2Fbash.bashrc.html'

---------------------------------------8<---------------------------------------
<html>
 <head>
  <title></title>
 </head>
 <body>
 </body>
</html>
---------------------------------------8<---------------------------------------


contents of 'test.php'
---------------------------------------8<---------------------------------------
<?php

$file = 'Linux_Files%2Fetc%2Fbash.bashrc.html';

$doc = new DOMDocument();
$doc->loadHTMLFile($file);
var_dump($doc->getElementsByTagName('body')->length);

echo str_repeat('-', 80), "\r\n";

$doc2 = new DOMDocument();
$doc2->loadHTMLFile(urlencode($file));
var_dump($doc2->getElementsByTagName('body')->length);
---------------------------------------8<---------------------------------------


Expected result:
----------------
Expect the ->loadHTMLFile($file) to succeed and the -
>loadHTMLFile(urlencode($file)) to fail with a file-not-found type error.

Actual result:
--------------
->loadHTMLFile($file) failes with errors:

PHP Warning:  DOMDocument::loadHTMLFile(): I/O warning : failed to load external 
entity "Linux_Files%2Fetc%2Fbash.bashrc.html" in /home/kicken/test.php on line 6

Warning: DOMDocument::loadHTMLFile(): I/O warning : failed to load external entity 
"Linux_Files%2Fetc%2Fbash.bashrc.html" in /home/kicken/test.php on line 6


->loadHTMLFile(urlencode($file)) succeeds.


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2013-12-02 16:34 UTC] mike@php.net
-Type: Bug +Type: Documentation Problem
 [2021-04-08 14:35 UTC] cmb@php.net
-Status: Open +Status: Verified -Package: DOM XML related +Package: XML related
 [2021-04-08 14:35 UTC] cmb@php.net
Hmm, this looks like a bug to me.  At least the behavior is
inconsistent with the general file functions, and it is not
directly related to libxml2.  Actually, the bug fix appears to be
trivial:


 ext/libxml/libxml.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ext/libxml/libxml.c b/ext/libxml/libxml.c
index fc194770e1..68fd0e37c7 100644
--- a/ext/libxml/libxml.c
+++ b/ext/libxml/libxml.c
@@ -305,7 +305,7 @@ static void *php_libxml_streams_IO_open_wrapper(const char *filename, const char
 
 
 	uri = xmlParseURI(filename);
-	if (uri && (uri->scheme == NULL ||
+	if (uri && (
 			(xmlStrncmp(BAD_CAST uri->scheme, BAD_CAST "file", 4) == 0))) {
 		resolved_path = xmlURIUnescapeString(filename, 0, NULL);
 		isescaped = 1;


But given the long standing behavior, and the BC break, this
shouldn't be fixed for any stable version (and probably only in a
new major version), so yes, the behavior should be documented.

And to be clear: this affects all XML extensions which use
libxml2, not only DOM.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 21 15:01:29 2024 UTC