php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #39078 Plus sign in URL arg received as space
Submitted: 2006-10-07 18:54 UTC Modified: 2010-10-27 17:28 UTC
Votes:24
Avg. Score:4.1 ± 0.9
Reproduced:22 of 23 (95.7%)
Same Version:9 (40.9%)
Same OS:6 (27.3%)
From: main at springtimesoftware dot com Assigned:
Status: Not a bug Package: *General Issues
PHP Version: 5.1.6 OS: Windows XP
Private report: No CVE-ID:
 [2006-10-07 18:54 UTC] main at springtimesoftware dot com
Description:
------------
I searched the bug database, but could not find this problem addressed.

This is a simple case using default configurations where a client JavaScript script sends a plus sign and a space as an argument as part of a URL to a server.

The script constructs the URL using the JavaScript 'escape' function, as recommended: 

URL='www.example.com/example.php?Arg='+escape('+ ');

The server, running Apache and PHP, automatically runs urldecode (that is, I think it does; I could not find this documented in the PHP manual even after I did a lot of searching).

The PHP code

$Arg=$_GET["Arg"];

receives the string as "  " (two spaces) instead of the expected "+ ". This is not a bug, but documented behavior of urldecode!

My request for a feature is this: add a runtime-accessible configuration option to suppress any default decoding of GET, POST, and other such arrays. Then the programmer can use rawurldecode to decode arguments properly.

Note: Although I only mentioned plus sign and space above, I really want to pass a string that can contain characters with any byte value, 0 to 255. This is to support cryptology protocols.

David Spector
Springtime Software




Reproduce code:
---------------
Let me know if you really need a test case. It would include a client page and a server page.


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2006-10-07 20:27 UTC] tony2001@php.net
That's why you should use urlencode().
 [2006-10-07 20:38 UTC] main at springtimesoftware dot com
JavaScript does not support urlencode.

Perhaps if you read the bug report again?

David
 [2006-10-07 20:48 UTC] tony2001@php.net
Javascript has escape() method exactly for that.
 [2006-10-07 20:58 UTC] main at springtimesoftware dot com
The JavaScript escape function does not do what the urlencode function does.

If it did, then escape on the JavaScript side would match urldecode on the PHP side, and this problem would not exist.

If you apply escape to "+ ", you get "+ ". On the PHP side, PHP automatically applies urldecode, and you get "  ".

So, the problem is that plus sign does not get through to the PHP script.

Is that clear now?

David
 [2006-10-07 21:02 UTC] main at springtimesoftware dot com
Oops, I should have said that escape('+ ') gives '+%20'.

On the PHP side, the "+" is considered an alias for " ", so the script sees "  " (urldecode converts "+" into " " and "%20" into " ").

I stand by the wording in the feature request.

David
 [2006-10-07 21:39 UTC] tony2001@php.net
PHP receives POST/GET data from Apache in decoded form, so if Javascript doesn't encode "+" sign, we can't fix it or even change it in anyway.
 [2006-10-07 21:44 UTC] main at springtimesoftware dot com
So you are saying that this problem is definitely in Apache, not in PHP?

You are saying that Apache converts plus signs into spaces?

Please confirm this, it is hard to believe.

David
 [2006-10-07 22:15 UTC] derick@php.net
It's not hard to believe... it's what the RFC states:
http://www.freesoft.org/CIE/RFC/1738/4.htm, read the section "unsafe".
 [2006-10-07 22:53 UTC] main at springtimesoftware dot com
I'm not sure I'm following you.

Section "Reserved:" in RFC 1738 (at http://www.freesoft.org/CIE/RFC/1738/4.htm) states:

----
Thus, only alphanumerics, the special characters "$-_.+!*'(),", and reserved characters used for their reserved purposes may be used unencoded within a URL.
----

Since "+" is listed, I would expect that any agent that obeyed this RFC would transmit "+" unchanged.

That means that Apache should transmit "+" unchanged to PHP.

This is why I would be surprised to find that Apache is the cause of this problem.

Indeed, if I browse (using IE 6.0) to a Web page that contains a call to phpinfo(), browsing using a URL that contains the argument "Arg=+%20", then phpinfo() reports that _SERVER["QUERY_STRING"] has the value "Arg=+%20". (I just did this, I'm not making this up.)

This confirms that the plus sign is getting to PHP okay.

So wouldn't you agree with me that Apache cannot be causing this problem?

PHP must be using urldecode() when it parses the arguments into the $_GET array, yes? Otherwise, how would the plus sign in the argument become a space?

David
 [2006-10-10 13:30 UTC] main at springtimesoftware dot com
So, that's it? Just a few ignorant attempts to classify this feature request as Bogus, with no assignment to a developer to make this feature request happen?

I'm disappointed.

An option to process incoming URL args using rawurldecode instead of urldecode would benefit so many users!

David Spector
 [2008-06-12 00:25 UTC] jerm at live dot com
I'm with David on this.

On the client-side, I'm using the JavaScript escape() function to encode data for sending to the server using a POST ajax request. (Original bug report refers to $_GET, but this is also affecting $_POST)

The server sees both plus signs "+" and "%20" as spaces. And yes, PHP is seeing the plus, untouched by Apache, as I can prove using:

echo file_get_contents("php://input"); // Display raw POST

This is very frustrating. I'm currently getting around this by parsing the raw POST data manually (above), and not using the pre-parsed $_POST data.
 [2008-07-16 20:18 UTC] edA-qa at disemia dot com
I would also like to add that decoding '+' to a space is just plain wrong. I got burnt again by this when using base64_encode, which should produce URL safe strings, but for PHP it doesn't, since it may include the '+'.

A global option to use the proper rawurldecode would be great.  Otherwise I'm stuck, like many developers, in reparsing the query string/url manually and unable to use _POST and _GET.
 [2009-08-10 15:02 UTC] boriss at web dot de
I'd like to see an option to change runtime behavior of PHP, too. Even if the Javascript function escape() would work a user could still enter an URL with a query string himself. Imagine you have a search engine and someone enters an URL with ?query=C++. If you use $_GET['query'] you just don't know if someone searches for "C++" or "C  ".
 [2009-10-06 17:05 UTC] toby dot walsh at fxhome dot com
I believe derick probably meant to link to rfc 2396

http://www.ietf.org/rfc/rfc2396.txt

It says...

----
Many URI include components consisting of or delimited by, certain
   special characters.  These characters are called "reserved", since
   their usage within the URI component is limited to their reserved
   purpose.  If the data for a URI component would conflict with the
   reserved purpose, then the conflicting data must be escaped before
   forming the URI.

      reserved    = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
                    "$" | ","
----

notice the "+" symbol is now in the reserved list.

This issue is confusing because the old rfc did indeed say that the "+" symbol did not need to be encoded. The new rfc 2396 actually draws attention to this change.

----
G.2. Modifications from both RFC 1738 and RFC 1808

Changed to URI syntax instead of just URL.

Confusion regarding the terms "character encoding", the URI
"character set", and the escaping of characters with %<hex><hex>
equivalents has (hopefully) been reduced.  Many of the BNF rule names
regarding the character sets have been changed to more accurately
describe their purpose and to encompass all "characters" rather than
just US-ASCII octets.  Unless otherwise noted here, these
modifications do not affect the URI syntax.

Both RFC 1738 and RFC 1808 refer to the "reserved" set of characters
as if URI-interpreting software were limited to a single set of
characters with a reserved purpose (i.e., as meaning something other
than the data to which the characters correspond), and that this set
was fixed by the URI scheme.  However, this has not been true in
practice; any character that is interpreted differently when it is
escaped is, in effect, reserved.  Furthermore, the interpreting
engine on a HTTP server is often dependent on the resource, not just
the URI scheme.  The description of reserved characters has been
changed accordingly.

The plus "+", dollar "$", and comma "," characters have been added to
those in the "reserved" set, since they are treated as reserved
within the query component.
----

So I believe PHP is correct to decode the "+" as a " ".

You should be using the javascript function encodeURIComponent() to  escape your strings. encodeURIComponent will encode "+" chars properly. Here's a good page which shows the difference between javascripts encoding functions.

http://xkr.us/articles/javascript/encode-compare/
 [2009-10-15 02:06 UTC] yolcoyama at gmail dot com
Since I encountered the same problem in php,
I wondered the cause of bug is really the php.
Chosing another script language (python) to attest,
in python (cgi), following code with query "q=c++" yields output of: {'q': ['c  ']}.
This shows that plus-sign is replaced with blank space independently on language (at least not only in php).

I found a solution (not fundamental) to receive query arithmetic characters
as raw string: rawurldecode(urlencode($whatever_qs))

It behaved as if blank space is restored to plus-sign (or other arithmetics sign).

* index.py
#!/usr/bin/python
import cgi,os
print "Content-Type: text/plain; charset=utf-8"
print
print cgi.parse_qs(os.environ['QUERY_STRING'])

Shinobu Y.
 [2010-10-27 17:28 UTC] cataphract@php.net
-Status: Open +Status: Bogus -Package: Feature/Change Request +Package: *General Issues
 [2012-02-23 02:33 UTC] techlivezheng at gmail dot com
Please use rawurldecode instead of urldecode to process $_GET value.
 [2012-02-23 05:45 UTC] techlivezheng at gmail dot com
My fault, this is accturally not a bug. There is no need to use rawurlencode, otherwise, it will cause " + " become "+++"。

The value contained "+" in both $_GET and $_POST must have been decoded before passed to php, and then it has been decoded by url_decode again in php leading "+" become " "。

Apache may be able to do that, one possiable cause is mod_rewrite module. Because everything must be decoded before mod_rewrite to work, after that, it doesn't encode again. 

This is what exactly happend.

" + " --------> "+%2B+" --------> " + " --------> "   "

       apache          mod_rewrite         php

Use "B" FLAG for mod_rewrite can fix this, see http://httpd.apache.org/docs/2.2/rewrite/flags.html#flag_b
 
PHP Copyright © 2001-2014 The PHP Group
All rights reserved.
Last updated: Thu Apr 17 18:02:13 2014 UTC