php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #69409 urlencode() does not conform to current w3 html5 recommendations
Submitted: 2015-04-09 16:35 UTC Modified: 2015-04-15 13:06 UTC
Votes:1
Avg. Score:4.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:0 (0.0%)
From: rainer-phpbugs at 7val dot com Assigned:
Status: Open Package: URL related
PHP Version: 5.6.7 OS: all
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: rainer-phpbugs at 7val dot com
New email:
PHP Version: OS:

 

 [2015-04-09 16:35 UTC] rainer-phpbugs at 7val dot com
Description:
------------
---
From manual page: http://www.php.net/function.urlencode
---
urlencode() does not conform to then current w3 html5 recommendations for  application/x-www-form-urlencoded data, in that '*' (0x2A) should be left as is.

http://www.w3.org/TR/html5/forms.html#application/x-www-form-urlencoded-encoding-algorithm

If the byte is in the range 0x2A [...] Leave the byte as is.

rfc3986 would also allow '*' to be left unchangef in the query part of an URL.

http://tools.ietf.org/html/rfc3986

query       = *( pchar / "/" / "?" )
pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
                 / "*" / "+" / "," / ";" / "="


Test script:
---------------
<?php
print(urlencode('*'));


Expected result:
----------------
'*' should not be %-encoded, i.e. the test script above should not print %2A, but the literal '*'.


Patches

php-urlencode-asterisk-0x2a (last revision 2015-04-09 16:36 UTC by rainer-phpbugs at 7val dot com)

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2015-04-14 20:45 UTC] yohgaki@php.net
-Type: Documentation Problem +Type: Feature/Change Request
 [2015-04-14 20:45 UTC] yohgaki@php.net
urlencode() is there for compatibility. rawurlencode() does the jobs.

http://3v4l.org/L0AHV

We shouldn't have 2 similar functions in the first place, probably...
Consolidate 2 functions into one (urlencode) by having mode flag in the long run?

http_build_query() has the same issue, but it has flag for modes.
 [2015-04-15 13:06 UTC] rainer-phpbugs at 7val dot com
Sadly, rawurlencode() doesn't comply with another part of the HTML 5 recommendation:

If the byte is 0x20 (U+0020 SPACE if interpreted as ASCII)
    Replace the byte with a single 0x2B byte ("+" (U+002B) character if interpreted as ASCII).

http://3v4l.org/q8p9i

print("rue: ".rawurlencode(' ')."\n");
print("ue: ".urlencode(' ')."\n");

rue: %20
ue: +

I sincerely hope that no third encoding function is necessary.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Wed Dec 04 08:01:29 2024 UTC