php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #71231 parse_url doesn't urldecode urlencoded characters
Submitted: 2015-12-28 19:31 UTC Modified: 2016-02-01 17:25 UTC
From: zelnaga at gmail dot com Assigned: willfitch (profile)
Status: Not a bug Package: URL related
PHP Version: 7.0.1 OS: Windows 7
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: zelnaga at gmail dot com
New email:
PHP Version: OS:

 

 [2015-12-28 19:31 UTC] zelnaga at gmail dot com
Description:
------------
If you urlencode a character in a URL than the parsed version of that URL ought to contain that character urldecoded in the output.

Test script:
---------------
<?php
$a = parse_url('https://www.google.com/se%61rch?q=test');

print_r($a);

Expected result:
----------------
Array
(
    [scheme] => https
    [host] => www.google.com
    [path] => /search
    [query] => q=test
)

Actual result:
--------------
Array
(
    [scheme] => https
    [host] => www.google.com
    [path] => /se%61rch
    [query] => q=test
)

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2015-12-28 20:47 UTC] yohgaki@php.net
-Type: Bug +Type: Feature/Change Request
 [2015-12-28 20:47 UTC] yohgaki@php.net
This is not a bug.
 [2015-12-28 20:50 UTC] yohgaki@php.net
If anyone care to implement decoding feature, please _never_ decode by default, since double(multiple) decoding is a cause of security issues.
 [2015-12-29 04:16 UTC] zelnaga at gmail dot com
Double decoding would be a BC breaking change but none-the-less I believe it's the correct behavior to decode. If you make an HTTP request to that URL Google's webserver decodes it as /search?q=test .

Like if you do /search?q=te%28st you'd get q=te%28st in the query part of the array but I believe it should be q=te#st because that's what the web server wold see with $_GET['q'] - not te%28st.
 [2016-01-15 08:07 UTC] simonsimcity at gmail dot com
I don't think it should be decoded at that level here.
Think of requests like the following:

var_dump(parse_url("/?foo=ab%26bar%3Dfoo"));
var_dump(parse_url(urldecode("/?foo=ab%26bar%3Dfoo")));

var_dump(parse_url("/foo%2Fab/bar"));
var_dump(parse_url(urldecode("/foo%2Fab/bar")));

I guess the last one is negligible, but I could well see the first example ...
 [2016-02-01 16:29 UTC] willfitch@php.net
@yohgaki - I completely agree it would result in unexpected behaviors for previous versions, but this is a bug.  The path is never decoded - and that isn't double decoding.  Decoding the path/file AND query parameters should be the expected behavior.  It is for pretty much every other language as well.

For the BC concern, this would be addressing a bug, so I don't agree that we'd "break" backwards compatibility - but rather fix it.
 [2016-02-01 16:30 UTC] willfitch@php.net
-Assigned To: +Assigned To: willfitch
 [2016-02-01 17:25 UTC] willfitch@php.net
-Status: Assigned +Status: Not a bug
 [2016-02-01 17:25 UTC] willfitch@php.net
Take that back - I forgot parse_url doesn't actually decode anything other than control characters. urldecode *does* decode the value correctly.

@zelnaga - if you need to decode, use urldecode.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Dec 26 10:01:29 2024 UTC