php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #70942 parse_url for url without port and scheme, but with port-like string in url
Submitted: 2015-11-19 15:48 UTC Modified: 2021-04-21 12:28 UTC
Votes:5
Avg. Score:4.0 ± 0.9
Reproduced:5 of 5 (100.0%)
Same Version:0 (0.0%)
Same OS:2 (40.0%)
From: taco at procurios dot nl Assigned: cmb (profile)
Status: Not a bug Package: URL related
PHP Version: 5.6.15 OS: linux
Private report: No CVE-ID: None
 [2015-11-19 15:48 UTC] taco at procurios dot nl
Description:
------------
When a URL without a scheme and port, but with a port-like string (e.g. example.com/foo:1234) is passed to parse_url(), a port number is returned. This wasn't the case in, at least, PHP 5.6.6 (3v4l.org seems to crash on this code...).

I'm guessing this commit introduced this issue: https://github.com/php/php-src/commit/e49922d3f8060e47f810a24ce48d4e622b493699

Test script:
---------------
var_dump(parse_url('//example.com/foo:1234'))

Expected result:
----------------
Either

bool(false)

or

array(2) {
  ["host"]=>
  string(11) "example.com"
  ["path"]=>
  string(9) "/foo:1234"
}


Actual result:
--------------
array(3) {
  ["host"]=>
  string(11) "example.com"
  ["port"]=>
  int(1234)
  ["path"]=>
  string(9) "/foo:1234"
}


Patches

Add a Patch

Pull Requests

Pull requests:

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2015-11-20 09:25 UTC] taco at procurios dot nl
The commit I've mentioned does not introduce this issue. Afaik, this issue is much older, but was 'masked' by bug #68917. After this bug was fixed (https://github.com/php/php-src/commit/d7fb52ea20aba5c9ef53dfa57af3b62717c9e9e5#diff-8c81b7e6f1bafce737814315214a5f23) parse_url no longer returned false for URLs without a scheme, but with a colon.

parse_url assumes that the colon character can only be used for the port subcomponent of a URI, but it is allowed in the path component too (see: https://tools.ietf.org/html/rfc3986#section-3.3). The parser should look for a colon character, but not after the first single slash in a URL, anything after that slash is part of the path, query or fragment.
 [2015-11-20 14:46 UTC] laruence@php.net
-Assigned To: +Assigned To: datibbaw
 [2015-11-20 14:46 UTC] laruence@php.net
@datibbaw, could you have a look into this one? thanks
 [2016-08-17 09:50 UTC] driezasson at icloud dot com
Even a URL with a port, but without a scheme doesn’t get parsed. E.g.: '//example.com:80/'.

Expected result:
['host' => 'example.com', 'port' => 80]

Actual result:
false
 [2017-10-24 07:15 UTC] kalle@php.net
-Status: Assigned +Status: Open -Assigned To: datibbaw +Assigned To:
 [2021-04-21 12:28 UTC] cmb@php.net
-Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb
 [2021-04-21 12:28 UTC] cmb@php.net
The problem is that parse_url() is so broken for partial URLs,
that whenever you fix something related to this, you likely
introduce a regression.  Thus, it's likely best to introduce a new
function with sane parsing conforming to RFC 3986.  This would
require the RFC process[1], though.

Anyhow, it is already documented[2] that

| Partial URLs are also accepted, parse_url() tries its best to
| parse them correctly.

So this is not a bug.

[1] <https://wiki.php.net/rfc/howto>
[2] <https://www.php.net/parse_url>
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Mar 28 18:01:29 2024 UTC