php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #70942 parse_url for url without port and scheme, but with port-like string in url
Submitted: 2015-11-19 15:48 UTC Modified: 2021-04-21 12:28 UTC
Votes:5
Avg. Score:4.0 ± 0.9
Reproduced:5 of 5 (100.0%)
Same Version:0 (0.0%)
Same OS:2 (40.0%)
From: taco at procurios dot nl Assigned: cmb (profile)
Status: Not a bug Package: URL related
PHP Version: 5.6.15 OS: linux
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: taco at procurios dot nl
New email:
PHP Version: OS:

 

 [2015-11-19 15:48 UTC] taco at procurios dot nl
Description:
------------
When a URL without a scheme and port, but with a port-like string (e.g. example.com/foo:1234) is passed to parse_url(), a port number is returned. This wasn't the case in, at least, PHP 5.6.6 (3v4l.org seems to crash on this code...).

I'm guessing this commit introduced this issue: https://github.com/php/php-src/commit/e49922d3f8060e47f810a24ce48d4e622b493699

Test script:
---------------
var_dump(parse_url('//example.com/foo:1234'))

Expected result:
----------------
Either

bool(false)

or

array(2) {
  ["host"]=>
  string(11) "example.com"
  ["path"]=>
  string(9) "/foo:1234"
}


Actual result:
--------------
array(3) {
  ["host"]=>
  string(11) "example.com"
  ["port"]=>
  int(1234)
  ["path"]=>
  string(9) "/foo:1234"
}


Patches

Pull Requests

Pull requests:

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2015-11-20 09:25 UTC] taco at procurios dot nl
The commit I've mentioned does not introduce this issue. Afaik, this issue is much older, but was 'masked' by bug #68917. After this bug was fixed (https://github.com/php/php-src/commit/d7fb52ea20aba5c9ef53dfa57af3b62717c9e9e5#diff-8c81b7e6f1bafce737814315214a5f23) parse_url no longer returned false for URLs without a scheme, but with a colon.

parse_url assumes that the colon character can only be used for the port subcomponent of a URI, but it is allowed in the path component too (see: https://tools.ietf.org/html/rfc3986#section-3.3). The parser should look for a colon character, but not after the first single slash in a URL, anything after that slash is part of the path, query or fragment.
 [2015-11-20 14:46 UTC] laruence@php.net
-Assigned To: +Assigned To: datibbaw
 [2015-11-20 14:46 UTC] laruence@php.net
@datibbaw, could you have a look into this one? thanks
 [2016-08-17 09:50 UTC] driezasson at icloud dot com
Even a URL with a port, but without a scheme doesn’t get parsed. E.g.: '//example.com:80/'.

Expected result:
['host' => 'example.com', 'port' => 80]

Actual result:
false
 [2017-10-24 07:15 UTC] kalle@php.net
-Status: Assigned +Status: Open -Assigned To: datibbaw +Assigned To:
 [2021-04-21 12:28 UTC] cmb@php.net
-Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb
 [2021-04-21 12:28 UTC] cmb@php.net
The problem is that parse_url() is so broken for partial URLs,
that whenever you fix something related to this, you likely
introduce a regression.  Thus, it's likely best to introduce a new
function with sane parsing conforming to RFC 3986.  This would
require the RFC process[1], though.

Anyhow, it is already documented[2] that

| Partial URLs are also accepted, parse_url() tries its best to
| parse them correctly.

So this is not a bug.

[1] <https://wiki.php.net/rfc/howto>
[2] <https://www.php.net/parse_url>
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Dec 27 06:01:29 2024 UTC