|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #71231 parse_url doesn't urldecode urlencoded characters
Submitted: 2015-12-28 19:31 UTC Modified: 2016-02-01 17:25 UTC
From: zelnaga at gmail dot com Assigned: willfitch (profile)
Status: Not a bug Package: URL related
PHP Version: 7.0.1 OS: Windows 7
Private report: No CVE-ID: None
 [2015-12-28 19:31 UTC] zelnaga at gmail dot com
If you urlencode a character in a URL than the parsed version of that URL ought to contain that character urldecoded in the output.

Test script:
$a = parse_url('');


Expected result:
    [scheme] => https
    [host] =>
    [path] => /search
    [query] => q=test

Actual result:
    [scheme] => https
    [host] =>
    [path] => /se%61rch
    [query] => q=test


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2015-12-28 20:47 UTC]
-Type: Bug +Type: Feature/Change Request
 [2015-12-28 20:47 UTC]
This is not a bug.
 [2015-12-28 20:50 UTC]
If anyone care to implement decoding feature, please _never_ decode by default, since double(multiple) decoding is a cause of security issues.
 [2015-12-29 04:16 UTC] zelnaga at gmail dot com
Double decoding would be a BC breaking change but none-the-less I believe it's the correct behavior to decode. If you make an HTTP request to that URL Google's webserver decodes it as /search?q=test .

Like if you do /search?q=te%28st you'd get q=te%28st in the query part of the array but I believe it should be q=te#st because that's what the web server wold see with $_GET['q'] - not te%28st.
 [2016-01-15 08:07 UTC] simonsimcity at gmail dot com
I don't think it should be decoded at that level here.
Think of requests like the following:



I guess the last one is negligible, but I could well see the first example ...
 [2016-02-01 16:29 UTC]
@yohgaki - I completely agree it would result in unexpected behaviors for previous versions, but this is a bug.  The path is never decoded - and that isn't double decoding.  Decoding the path/file AND query parameters should be the expected behavior.  It is for pretty much every other language as well.

For the BC concern, this would be addressing a bug, so I don't agree that we'd "break" backwards compatibility - but rather fix it.
 [2016-02-01 16:30 UTC]
-Assigned To: +Assigned To: willfitch
 [2016-02-01 17:25 UTC]
-Status: Assigned +Status: Not a bug
 [2016-02-01 17:25 UTC]
Take that back - I forgot parse_url doesn't actually decode anything other than control characters. urldecode *does* decode the value correctly.

@zelnaga - if you need to decode, use urldecode.
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Jun 15 11:01:30 2024 UTC