php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #63769 file:// protocol does not support percent-encoded characters
Submitted: 2012-12-14 15:51 UTC Modified: 2013-01-16 17:40 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: hanskrentel at yahoo dot de Assigned:
Status: Not a bug Package: Streams related
PHP Version: 5.4.9 OS: Windows
Private report: No CVE-ID: None
 [2012-12-14 15:51 UTC] hanskrentel at yahoo dot de
Description:
------------
Using a file-URI containing characters that are percent-encoded (one byte/octet 
is encoded as a character triplet, e.g. Space -> %20) do not work. The URI is 
not properly decoded.

Consider the following file on disk:

c:\temp\catalog 2.xml

PHP is able to find it existing via:

is_file('file:///C:/temp/catalog 2.xml');

However, commonly that file is written as:

file:///C:/temp/catalog%202.xml

And using that filename in PHP via:

is_file('file:///C:/temp/catalog%202.xml');

gives FALSE.

(Example is a libxml catalog file, properly specified for libxml)

When you're looking into this, it might be worth to also look for + as encoding 
for space - just not that this case gets overlooked.




Test script:
---------------
<?php
touch($name = 'catalog 2.xml');
$uri = sprintf('file:///%s/%s', strtr(__DIR__, ['\\' => '/', ' ' => '%20']), rawurlencode($name));
printf('%s - %s (%d)', is_file($uri) ? 'OK' : 'FAIL', $uri, unlink($name));

Expected result:
----------------
OK - file:///C:/temp/catalog%202.xml (1)

Actual result:
--------------
FAIL - file:///C:/temp/catalog%202.xml (1)

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2012-12-15 11:12 UTC] ab@php.net
-Status: Open +Status: Not a bug
 [2012-12-15 11:12 UTC] ab@php.net
Since both "catalog%202.xml" and "catalog 2.xml" are perfectly valid filenames, the fs function behaviour is correct. The user can decide where it's needed url encoded and where it's not.
 [2012-12-17 14:55 UTC] hanskrentel at yahoo dot de
I beg your pardon, but if both are perfectly valid filenames, the fs function 
behaviour is broken (and not correct), because it does not work with both 
perfectly valid filenames:

The path "file:///C:/temp/catalog%202.xml" (without the quotes) does *not* work.

The request is to remove this shortcoming and have it work. I thought this was 
clear from the initial report. Please let me know how I can further assist with 
this.
 [2012-12-20 16:27 UTC] anon at anon dot anon
>The path "file:///C:/temp/catalog%202.xml" (without the quotes) does *not* work.

Unfortunately it does work, if you create a file literally named catalog%202.xml. PHP's behavior is broken but it's impossible to alter it without breaking compatibility.
 [2013-01-06 01:29 UTC] hanskrentel at yahoo dot de
If you would create a file named

    catalog%202.xml.

You would have wanted to access it via:

    'file:///C:/temp/catalog%%25202.xml'

which as well does not work. I'm not doing a bug report here to be treated like 
an idiot. I better suggest you piss completely off 
instead of leaving such an idiotic comment. My 2 cents.
 [2013-01-06 06:38 UTC] anon at anon dot anon
>You would have wanted to access it via 'file:///C:/temp/catalog%%25202.xml'

Actually, 'catalog%25202.xml', but I know, I'm agreeing with you. I'm just pointing out that this erroneous behavior may be depended on somewhere in some PHP script, where the author, in good faith, did whatever made things work. I assume you're going to pass your path through urldecode (or not encode it in the first place), and then you'll be one of them.

In any case, you're unlikely to get any support here. The reviewers here don't do much except dismiss things as 'Not a bug' and once they've successfully done that they lose interest. C'est le PHP.
 [2013-01-06 07:03 UTC] anon at anon dot anon
Actually, hold on a sec, plus signs are *not* supposed to be decoded here. That means that file names containing plus signs would not be broken by a fix, and only file names containing a '%xx' (where x is a hexit) sequence would be affected, which is probably uncommon. Perhaps you have a chance.
 [2013-01-08 17:12 UTC] ab@php.net
@hanskrentel

That's my test:

- create file 'catalog%202.xml' with content "percent filename"
- create file 'catalog 2.xml' with content "space filename"
- then run
php -r "echo file_get_contents('file://C:/my/path/catalog%202.xml');"
percent filename

- then run
php -r "echo file_get_contents('file://C:/my/path/catalog 2.xml');"
space filename

That's pretty straight forward. That's what I mean - no decoding, both are valid filenames. The decoding should be done in your app depending on what it needs. In your example - you create 'catalog 2.xml' and are trying to stat 'catalog%202.xml', literally. But 'catalog%202.xml' doesn't exist.
 [2013-01-16 16:18 UTC] hanskrentel at yahoo dot de
@ab:

Consider you have a file containing a space in the filename, and you *need to* specify the filename in form of a file:// URI for which space is a special character that needs proper URL-encoding.

That file://-URI btw is set in an environemnt variable that requires it (XML_CATALOG_FILES).

Domdocument in PHP internally is then using that file://-URI and can't process it properly because the wrapper is not able to properly decode the path information.

You actually pretty well demonstrate the problem in your example:

php -r "echo file_get_contents('file://C:/my/path/catalog%202.xml');"
percent filename

Is obviously wrong. %20 in a file://-URI is an ecoded space, so the content

space filename

needs to be output instead. The filename you meant is properly written as:

php -r "echo file_get_contents('file://C:/my/path/catalog%25202.xml');"
percent filename

compare: http://tools.ietf.org/html/rfc3986#section-2.1

Please add that example to yours because only if you have the two opposite cases (encoded *and* decoded) you can actually work out concrete results. You are just having two times the same example, 
of which I think both shows the 
same form of wrong: Missing encoding in those URIs.

Which brings me to the point: Is there actually any interest to fix this? I mean there is not much standing in the way if you ask me. Normally users are not using the file:// URIs at all.

Those who did most likely used the space (or would have complained earlier here, but I could find no bug-report). The only edge-case I can see is with files containing percent-signs, however how 
likely is that at all?

Let me know if I would sponsor some well written patch how the chances would be to get this fixed.
 [2013-01-16 17:40 UTC] pajoye@php.net
it is your job to decode it, file:// does not have and does not follow the % used 
in other areas.

btw, paths on windows can contain the %, see http://msdn.microsoft.com/en-
us/library/windows/desktop/aa365247(v=vs.85).aspx for a list of not allowed 
characters.
 [2013-01-16 19:41 UTC] hanskrentel at yahoo dot de
Pierre, not helpful. Should I say "as usual"?

I explain you briefly, so you can see how easily you fool yourself:

You point to some standard reference (here to the MSDN) to make up the argument that % can be part of a file-name. I never neglected that. So what do you want to say with that link? Probably that there is some standardization in file-names inside an OS?

Well that's fine.

My point is that some standard with the URI standardization is not properly implemented in PHP. A very common standard btw. despite the file:// URI is not really standardized, URI and percent encoding *is*.

Now you bring in some other standard. You probably wanted to create the impression that PHP itself would actually follow that standard, but what should I tell you: Naturally like not following the URI standard (as pointed out in this issue), the Windows rules for valid file names aren't properly implemented either (!!!).

But this is not what my bug-report is about. 

Or was it that you just wanted to give the example that PHP does not even needs the file-system file-naming rules because it makes it's own ones? That it does not have to follow these, because it's superior?
 
PHP Copyright © 2001-2019 The PHP Group
All rights reserved.
Last updated: Tue Oct 22 09:01:29 2019 UTC