php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #53187 urlencode and ~ encodes differently in the various versions of the functions
Submitted: 2010-10-28 01:10 UTC Modified: 2011-01-27 00:55 UTC
From: asphp at dsgml dot com Assigned: aharvey (profile)
Status: Closed Package: Documentation problem
PHP Version: trunk-SVN-2010-10-27 (SVN) OS:
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: asphp at dsgml dot com
New email:
PHP Version: OS:

 

 [2010-10-28 01:10 UTC] asphp at dsgml dot com
Description:
------------
In urlencode a ~ (tilde) is escaped.

In rawurlencode, when using ASCII a ~ is NOT escaped, but when using EBCDIC it IS escaped.

There is no mention of ~ being special in the docs for rawurlencode or the docs for urlencode.

And if it is special for rawurlencode, it should act the same in ASCII and EBCDIC.


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2010-10-28 01:20 UTC] asphp at dsgml dot com
rawurlencode was changed in revision 260750: http://svn.php.net/viewvc?view=revision&revision=260750

It mentions RFC3986, but after reading it seems to me that ~ should also be un-encoded in urlencode not just rawurlencode
 [2010-10-28 17:16 UTC] cataphract@php.net
-Type: Bug +Type: Documentation Problem
 [2010-10-28 17:16 UTC] cataphract@php.net
RFC 3986, which apparently rawurlencode is to follow (though the documentation mentions the obsolete 1738 -- that should be changed):

unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"

For consistency, percent-encoded octets in the ranges of ALPHA
   (%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E),
   underscore (%5F), or tilde (%7E) should not be created by URI
   producers and, when found in a URI, should be decoded to their
   corresponding unreserved characters by URI normalizers.

So the behavior of rawurlencode is correct in not escaping ~.

urlencode, on the other hand, mentions the encoding mechanism application/x-www-form-urlencoded. This is defined in the HTML 4.01 spec and Xforms 1.1 spec:

* http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.1
* http://www.w3.org/TR/2009/REC-xforms-20091020/#serialize-urlencode

The first points to RFC 1738, under which ~ should be encoded, as it is. The second mentions reserved characters "as defined by [RFC 2396] as amended by subsequent documents in the IETF track", so it would refer to the RFC 3986, which obsoletes 2396. I'd say the current behavior is correct because it "plays safe" by following the HTML 4/RFC 1738.

As for EBCDIC, well, the support is broken anyway. urlencode/rawurlencode should receive the string in UTF-8 encoding, since that's the only way it can give meaningful results. The current implementation assumes that, in an EBCDIC system, it receives the string in EBCDIC. But yes, there's an inconsistency between the ASCII and EBCDIC version.
 [2010-11-03 21:53 UTC] kalle@php.net
-Package: URL related +Package: Documentation problem
 [2010-11-06 20:12 UTC] frozenfire@php.net
Automatic comment from SVN on behalf of frozenfire
Revision: http://svn.php.net/viewvc/?view=revision&revision=305134
Log: Updated to reflect RFC 3986, as per bug #53187.
 [2010-11-06 20:16 UTC] frozenfire@php.net
-Status: Open +Status: Closed -Assigned To: +Assigned To: frozenfire
 [2010-11-06 20:16 UTC] frozenfire@php.net
This bug has been fixed in the documentation's XML sources. Since the
online and downloadable versions of the documentation need some time
to get updated, we would like to ask you to be a bit patient.

Thank you for the report, and for helping us make our documentation better.


 [2010-11-07 02:10 UTC] asphp at dsgml dot com
Thanks for fixing this!

I think you should also fix the inconsistency between the ASCII and EBCDIC versions before closing this. (I know EBCDIC doesn't work right, but the bug should be fixed for whenever it does work.)

Should I reopen it?
 [2010-11-07 02:31 UTC] frozenfire@php.net
-Status: Closed +Status: Feedback
 [2010-11-07 02:31 UTC] frozenfire@php.net
Could you please provide more information regarding the discrepancy in the EBCDIC version? I'm not familiar with EBCDIC.
 [2010-11-07 06:52 UTC] asphp at dsgml dot com
-Status: Feedback +Status: Assigned
 [2010-11-07 06:52 UTC] asphp at dsgml dot com
In the function: php_raw_url_encode

Change:

 if (!isalnum(str[y]) && strchr("_-.", str[y]) != NULL) { 

to:

 if (!isalnum(str[y]) && strchr("_-.~", str[y]) != NULL) { 

(Around line 588, add the ~ to the EBCDIC version of the function.)
 [2010-11-07 15:55 UTC] frozenfire@php.net
-Status: Assigned +Status: Suspended
 [2010-11-07 15:55 UTC] frozenfire@php.net
Alright, the EBCDIC version of rawurlencode has been fixed in SVN (http://bugs.php.net/bug.php?id=53248), but of course that won't show up just yet.

Once the fix is released, I'll make a note in the documentation to that effect.
 [2011-01-02 22:03 UTC] frozenfire@php.net
-Status: Suspended +Status: To be documented -Assigned To: frozenfire +Assigned To:
 [2011-01-02 22:03 UTC] frozenfire@php.net
Changed to "to be documented".
 [2011-01-27 00:55 UTC] aharvey@php.net
Automatic comment from SVN on behalf of aharvey
Revision: http://svn.php.net/viewvc/?view=revision&revision=307773
Log: Fix doc bug #53187 (urlencode and ~ encodes differently in the various versions
of the functions).
 [2011-01-27 00:55 UTC] aharvey@php.net
-Status: To be documented +Status: Closed -Assigned To: +Assigned To: aharvey
 [2011-01-27 00:55 UTC] aharvey@php.net
This bug has been fixed in the documentation's XML sources. Since the
online and downloadable versions of the documentation need some time
to get updated, we would like to ask you to be a bit patient.

Thank you for the report, and for helping us make our documentation better.


 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri May 17 15:01:34 2024 UTC