php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #16337 include() does not decode % correctly
Submitted: 2002-03-28 18:12 UTC Modified: 2002-07-10 22:38 UTC
Votes:7
Avg. Score:4.6 ± 0.5
Reproduced:7 of 7 (100.0%)
Same Version:0 (0.0%)
Same OS:0 (0.0%)
From: tmorgan-spam at kavi dot com Assigned:
Status: Closed Package: HTTP related
PHP Version: 4.3.0-dev OS: Unix based
Private report: No CVE-ID: None
 [2002-03-28 18:12 UTC] tmorgan-spam at kavi dot com
When include() is called with the following syntax:

include("http://username:password@www.example.org/");

It is the duty of the include call to tokenize the username and password, and to urldecode each of them.  Why?  Because things would break if a username contained 'www.example.com/?var='  or say a password contained an @.  So, it is the duty of the caller to urlencode these tokens, and the duty of include (or a sub function) to unencode it after parsing.  

However, it has been observed in PHP 4.1.x that '%' characters (or their equivalent '%25') are not decoded properly.  Prior use of this feature leads us to believe the 4.0.x series of PHP does not have this problem.  

We run websites with hundreds of users.  We would appreciate a quick response, because we would rather not force all users with '%'s in their passwords to change them.  Thank you.

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2002-04-10 19:57 UTC] tmorgan-spam at kavi dot com
Correction: 
 PHP's fopen url wrapper doesn't appear to unencode ANY encodings at all.  Since the HTTP spec only excludes ':' from the username (and nothing at all from the password), this bug makes many username:password pairs unusable.
 [2002-04-10 20:01 UTC] tmorgan-spam at kavi dot com
in case anyone wondered, the HTTP spec I am refering to, is at:
http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc2068.html
(Section 11.1)
Since RFC 2616 doesn't specify user-pass strings, I assume this older RFC still applies.
 [2002-04-13 19:11 UTC] shiflett@php.net
RFC 2068 is obsoleted by RFC 2616, so your assumption that RFC 2068 is still adhered to is incorrect.

The username:password string is base-64 encoded for Basic Authentication, which I'm going to assume you're using, since you didn't mention what type of authentication the server demands for access to http://www.example.org/. It is not URL encoded.

Now, as for your bug report, why in the world would the include function *decode* the username:password string? The content coming back from the server doesn't specify the username and password to be used; that's what *you* sent.

See RFC 2617 for more information about both Basic and Digest Authentication over HTTP. http://ietf.org/rfc/rfc2617.txt will point you there.

I think a better understanding of this might help you solve your problem on your own. However, if this does not help, it should at least allow you to better explain what the problem is, because your reports have too many holes for me to understand what problem you are having, whether a bug in PHP or not.
 [2002-04-15 14:04 UTC] tmorgan-spam at kavi dot com
Ok, so I think maybe I shouldn't have brought up the HTTP spec.  The *only* reason I did that, was that HTTP has one small limitation on username:password pairs.  As far as I could find, in all of new and old RFCs (including 2617) was that ':' is disallowed from the username.  I have read of no limitations on what the password can contain.

Based on the above, I am going to make the conjecture that if the 'http://USERNAME:PASSWORD@...' syntax in PHP can't handle certain 'special' characters, then it is broken.  

The way I was trying to use this feature was like the following:
I was trying to include some headers and footers from my *local* webserver into my PHP script when it is displayed.  Why am I using an URL for this??  Well, because the script that dynamically generates those headers and footers is written in a different language.  It is a temporary fix for me while I port code.  In any case, the authentication realm that my PHP script lives in, is the same realm as the headers and footers.  So, when a user hits the PHP script, the username and password variables are stuck together into a URL like: "http://$username:$password@www.example.org/header.cgi"
Then I use an include() call to grab the header.  Make sense so far?  

The problem is, PHP bombs when the user's password contains any special characters.  More specifically, if it contains characters that are normally considered special in URL terms.

Just suppose for a minute, that my password contains an '@'.  How is PHP to parse this?  Should it assume that the first or last @ found is the delimiter?  What if the username was equal to 'www.example.org/evilscript.cgi?var='?  Then we have a Cross-Site Scripting Vulnerability on our hands.

So, my reasoning for the urlencoding was that if the user was responsible, they would urlencode the username and password individually, thus making the URL actually parseable in any situation.  The only way this would work though, is if PHP unencoded those tokens after parsing them out of the URL.  Currently it does not.  Once again, this escaping problem is between the caller and the include() function, not between the PHP internals and HTTP.
 [2002-05-14 00:00 UTC] php-bugs at lists dot php dot net
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
 [2002-05-14 00:46 UTC] mfischer@php.net
Feedback was given, but status not changed. Reopening.
 [2002-05-14 01:42 UTC] tmorgan-spam at kavi dot com
We are still having issues with this.  If you require any additional explanation, or examples, I can give them.  Some indication of whether this is being worked on or not would be nice...
 [2002-07-09 19:51 UTC] sniper@php.net
First, try this snapshot:

http://snaps.php.net/php4-latest.tar.gz

And if this doesn't work like you think it should work,
provide a short but complete script which clearly (!) demonstrates the possible bug.

 [2002-07-09 20:34 UTC] tmorgan-spam at kavi dot com
You know, I would really like this bug fixed, but I am really frustrated by the attitude I am getting here.  Three and a half months have passed, and yet not a single developer at PHP has taken 10 minutes to attempt to replicate it themselves.  I know there are a lot of bugs in PHP that need fixing, but cmon, at least an assessment of when it is going to be fixed.  And if it IS fixed in the new release, then why don't you tell me that?

As for example code, well, I have already given the one line that is necessary, but I will try to make it plainer:

// GIVEN: user provided $username & $password
// What SHOULD work:
$clean_username = urlencode($username);
$clean_password = urlencode($password);
include("http://$clean_username:$clean_password@www.example.com/");


The url parser in include() needs to parse, then decode, those two strings before passing them to the HTTP session. If you want to know why this should be this way, read my previous comments.
 [2002-07-09 20:59 UTC] eru@php.net
Judging from the RFC, I'd say you're not using the appropriate encoding scheme. The RFC says:

base64-user-pass  = <base64 encoding of user-pass>
user-pass   = userid ":" password
userid      = *<TEXT excluding ":">
password    = *TEXT

So in your case this would be
$user_pass64 = base64_encode( $username.":".$password );
include("http://".$user_pass64."@www.example.com/");

 [2002-07-09 23:46 UTC] sniper@php.net
The bug can be found here:

It's partly php_url_parse's fault. It uses regexps
to separate the url parts..but will fail if you pass it 
a url like this:

http://username:pass@word@www.example.org/

This will end up as having username 'username' and password 'pass'. (host being word@www.example.org)

Now, if the username and password are urlencoded (or base64 encoded) the above mentioned regexp will work..but this will fail:

ext/standard/http_fopen_wrapper.c:150-165

As it goes and base64-encodes the user/password pair.

So, which should be fixed? Let user pass the username/password without modifying them first (fix the regexps) or require them to be urlencoded and decode them
before base64 encoding ??

 [2002-07-10 02:10 UTC] tmorgan-spam at kavi dot com
Regarding base64 encoding:

Yes, the HTTP spec does use b64 for wrapping up username:password pairs.  However, you must remember that A: the URL syntax is you see in a browser is much different from the data that is passed to a webserver in HTTP.  
B: the http://USERNAME:PASSWORD@domain.tld/  syntax is more of a defacto-standard-made-legit, and the parsing of it really has nothing to do with HTTP.

Let me explain more by example.  Say my username was the literal USERNAME, and my password was the literal PASSWORD.  then the encoded pair is: VVNFUk5BTUU6UEFTU1dPUkQ=

that would make my request URL look like:

http://VVNFUk5BTUU6UEFTU1dPUkQ=@www.example.org/

Isn't that a little weird?  Does the parser handle '=' before the @?  maybe, but I don't think this syntax is what is originally intended...

So, if instead you mean to encode each of the two tokens seperately, then the parser would have to unencode b64 and re-encode them again with the ':' in the middle.

Therefore, does it really matter what encoding is used the first time through?  It is undone anyway.  To me, it only makes sense to use URL encoding, since that is what we are talking about here.  We need to escape characters that would otherwise be considered special in a URL.  

<rant>
Every syntax has a set of special characters.  When embedding user data that uses those characters, those characters must be escaped or quoted.  This is a simple principle that 90% of web developers haven't caught onto yet, and that is why the internet is plagued with cross site scripting vulnerabilities. (And SQL injection, and shell injection, and...)
</rant>
 [2002-07-10 02:18 UTC] tmorgan-spam at kavi dot com
In response to sniper's post:

As I said in my previous post, re: b64, it only makes sense to encode your user data in a format meant for escaping characters for *that* format.

would you run mysql_escape_string() on a string that you wanted to quote to avoid cross-site-scripting (XSS)?  Well, MAYBE that function will escape the necessary characters properly (which in this example it doesn't), but if mysql changes its set of special characters, or its method of escaping them, then the code might break later.

So, what I am saying, is don't escape oranges with something that is meant to escape apples.  urlencode/urldecode are meant for URL special characters.  Base 64 is just a general purpose method for escaping odd characters, and yes, it might work, but it is nearly impossible for humans to read, even if no special characters existed in the username:password pair.

Thanks for looking into this guys.
 [2002-07-10 22:38 UTC] sniper@php.net
This bug has been fixed in CVS. You can grab a snapshot of the
CVS version at http://snaps.php.net/. In case this was a documentation 
problem, the fix will show up soon at http://www.php.net/manual/.
In case this was a PHP.net website problem, the change will show
up on the PHP.net site and on the mirror sites.
Thank you for the report, and for helping us make PHP better.

Any username/password are now urldecoded before passing
them on. 
 [2002-07-12 14:52 UTC] tmorgan-spam at kavi dot com
rock on, thanks for taking the time to listen, sniper.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Apr 19 10:01:28 2024 UTC