php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #71231 parse_url doesn't urldecode urlencoded characters
Submitted: 2015-12-28 19:31 UTC Modified: 2016-02-01 17:25 UTC
From: zelnaga at gmail dot com Assigned: willfitch (profile)
Status: Not a bug Package: URL related
PHP Version: 7.0.1 OS: Windows 7
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: zelnaga at gmail dot com
New email:
PHP Version: OS:

 

 [2015-12-28 19:31 UTC] zelnaga at gmail dot com
Description:
------------
If you urlencode a character in a URL than the parsed version of that URL ought to contain that character urldecoded in the output.

Test script:
---------------
<?php
$a = parse_url('https://www.google.com/se%61rch?q=test');

print_r($a);

Expected result:
----------------
Array
(
    [scheme] => https
    [host] => www.google.com
    [path] => /search
    [query] => q=test
)

Actual result:
--------------
Array
(
    [scheme] => https
    [host] => www.google.com
    [path] => /se%61rch
    [query] => q=test
)

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2015-12-28 20:47 UTC] yohgaki@php.net
-Type: Bug +Type: Feature/Change Request
 [2015-12-28 20:47 UTC] yohgaki@php.net
This is not a bug.
 [2015-12-28 20:50 UTC] yohgaki@php.net
If anyone care to implement decoding feature, please _never_ decode by default, since double(multiple) decoding is a cause of security issues.
 [2015-12-29 04:16 UTC] zelnaga at gmail dot com
Double decoding would be a BC breaking change but none-the-less I believe it's the correct behavior to decode. If you make an HTTP request to that URL Google's webserver decodes it as /search?q=test .

Like if you do /search?q=te%28st you'd get q=te%28st in the query part of the array but I believe it should be q=te#st because that's what the web server wold see with $_GET['q'] - not te%28st.
 [2016-01-15 08:07 UTC] simonsimcity at gmail dot com
I don't think it should be decoded at that level here.
Think of requests like the following:

var_dump(parse_url("/?foo=ab%26bar%3Dfoo"));
var_dump(parse_url(urldecode("/?foo=ab%26bar%3Dfoo")));

var_dump(parse_url("/foo%2Fab/bar"));
var_dump(parse_url(urldecode("/foo%2Fab/bar")));

I guess the last one is negligible, but I could well see the first example ...
 [2016-02-01 16:29 UTC] willfitch@php.net
@yohgaki - I completely agree it would result in unexpected behaviors for previous versions, but this is a bug.  The path is never decoded - and that isn't double decoding.  Decoding the path/file AND query parameters should be the expected behavior.  It is for pretty much every other language as well.

For the BC concern, this would be addressing a bug, so I don't agree that we'd "break" backwards compatibility - but rather fix it.
 [2016-02-01 16:30 UTC] willfitch@php.net
-Assigned To: +Assigned To: willfitch
 [2016-02-01 17:25 UTC] willfitch@php.net
-Status: Assigned +Status: Not a bug
 [2016-02-01 17:25 UTC] willfitch@php.net
Take that back - I forgot parse_url doesn't actually decode anything other than control characters. urldecode *does* decode the value correctly.

@zelnaga - if you need to decode, use urldecode.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Dec 26 23:01:28 2024 UTC