|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
[2011-08-06 06:37 UTC] keithm at aoeex dot com
Description:
------------
DOMDocument::LoadHTMLFile appears to urldecode it's argument, which causes
problems when attempting to load a file containing a %xx sequence.
This issue was brought up on ##php in freenode when someone was attempting to load
a file named 'Linux_Files%2Fetc%2Fbash.bashrc.html'. Suggested work around was to
use LoadHTML + file_get_contents instead.
There was a small debate over whether this is a bug, or just a documentation
problem (perhaps LoadHTMLFile expects a URL).
DOMDocument::Load() is also affected.
Test script:
---------------
Contents of 'Linux_Files%2Fetc%2Fbash.bashrc.html'
---------------------------------------8<---------------------------------------
<html>
<head>
<title></title>
</head>
<body>
</body>
</html>
---------------------------------------8<---------------------------------------
contents of 'test.php'
---------------------------------------8<---------------------------------------
<?php
$file = 'Linux_Files%2Fetc%2Fbash.bashrc.html';
$doc = new DOMDocument();
$doc->loadHTMLFile($file);
var_dump($doc->getElementsByTagName('body')->length);
echo str_repeat('-', 80), "\r\n";
$doc2 = new DOMDocument();
$doc2->loadHTMLFile(urlencode($file));
var_dump($doc2->getElementsByTagName('body')->length);
---------------------------------------8<---------------------------------------
Expected result:
----------------
Expect the ->loadHTMLFile($file) to succeed and the -
>loadHTMLFile(urlencode($file)) to fail with a file-not-found type error.
Actual result:
--------------
->loadHTMLFile($file) failes with errors:
PHP Warning: DOMDocument::loadHTMLFile(): I/O warning : failed to load external
entity "Linux_Files%2Fetc%2Fbash.bashrc.html" in /home/kicken/test.php on line 6
Warning: DOMDocument::loadHTMLFile(): I/O warning : failed to load external entity
"Linux_Files%2Fetc%2Fbash.bashrc.html" in /home/kicken/test.php on line 6
->loadHTMLFile(urlencode($file)) succeeds.
PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
|
|||||||||||||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Tue Oct 28 03:00:02 2025 UTC |
Hmm, this looks like a bug to me. At least the behavior is inconsistent with the general file functions, and it is not directly related to libxml2. Actually, the bug fix appears to be trivial: ext/libxml/libxml.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ext/libxml/libxml.c b/ext/libxml/libxml.c index fc194770e1..68fd0e37c7 100644 --- a/ext/libxml/libxml.c +++ b/ext/libxml/libxml.c @@ -305,7 +305,7 @@ static void *php_libxml_streams_IO_open_wrapper(const char *filename, const char uri = xmlParseURI(filename); - if (uri && (uri->scheme == NULL || + if (uri && ( (xmlStrncmp(BAD_CAST uri->scheme, BAD_CAST "file", 4) == 0))) { resolved_path = xmlURIUnescapeString(filename, 0, NULL); isescaped = 1; But given the long standing behavior, and the BC break, this shouldn't be fixed for any stable version (and probably only in a new major version), so yes, the behavior should be documented. And to be clear: this affects all XML extensions which use libxml2, not only DOM.