php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #69409 urlencode() does not conform to current w3 html5 recommendations
Submitted: 2015-04-09 16:35 UTC Modified: 2015-04-15 13:06 UTC
Votes:1
Avg. Score:4.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:0 (0.0%)
From: rainer-phpbugs at 7val dot com Assigned:
Status: Open Package: URL related
PHP Version: 5.6.7 OS: all
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: rainer-phpbugs at 7val dot com
New email:
PHP Version: OS:

 

 [2015-04-09 16:35 UTC] rainer-phpbugs at 7val dot com
Description:
------------
---
From manual page: http://www.php.net/function.urlencode
---
urlencode() does not conform to then current w3 html5 recommendations for  application/x-www-form-urlencoded data, in that '*' (0x2A) should be left as is.

http://www.w3.org/TR/html5/forms.html#application/x-www-form-urlencoded-encoding-algorithm

If the byte is in the range 0x2A [...] Leave the byte as is.

rfc3986 would also allow '*' to be left unchangef in the query part of an URL.

http://tools.ietf.org/html/rfc3986

query       = *( pchar / "/" / "?" )
pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
                 / "*" / "+" / "," / ";" / "="


Test script:
---------------
<?php
print(urlencode('*'));


Expected result:
----------------
'*' should not be %-encoded, i.e. the test script above should not print %2A, but the literal '*'.


Patches

php-urlencode-asterisk-0x2a (last revision 2015-04-09 16:36 UTC by rainer-phpbugs at 7val dot com)

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2015-04-14 20:45 UTC] yohgaki@php.net
-Type: Documentation Problem +Type: Feature/Change Request
 [2015-04-14 20:45 UTC] yohgaki@php.net
urlencode() is there for compatibility. rawurlencode() does the jobs.

http://3v4l.org/L0AHV

We shouldn't have 2 similar functions in the first place, probably...
Consolidate 2 functions into one (urlencode) by having mode flag in the long run?

http_build_query() has the same issue, but it has flag for modes.
 [2015-04-15 13:06 UTC] rainer-phpbugs at 7val dot com
Sadly, rawurlencode() doesn't comply with another part of the HTML 5 recommendation:

If the byte is 0x20 (U+0020 SPACE if interpreted as ASCII)
    Replace the byte with a single 0x2B byte ("+" (U+002B) character if interpreted as ASCII).

http://3v4l.org/q8p9i

print("rue: ".rawurlencode(' ')."\n");
print("ue: ".urlencode(' ')."\n");

rue: %20
ue: +

I sincerely hope that no third encoding function is necessary.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Wed Dec 04 18:01:31 2024 UTC