php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #74599 parse_url allows bad characters in the common name
Submitted: 2017-05-15 18:51 UTC Modified: 2021-09-24 16:12 UTC
Votes:1
Avg. Score:3.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:0 (0.0%)
Same OS:1 (100.0%)
From: bj dot cardon at gmail dot com Assigned: cmb (profile)
Status: Closed Package: URL related
PHP Version: 7.0.19 OS: Linux
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If this is not your bug, you can add a comment by following this link.
If this is your bug, but you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: bj dot cardon at gmail dot com
New email:
PHP Version: OS:

 

 [2017-05-15 18:51 UTC] bj dot cardon at gmail dot com
Description:
------------
As the title says, the parse_url function allows backslashes to be in the hostname part of a URL and considers it valid. You can see the test script below showing this behavior.

Test script:
---------------
bcardon@bcardon-base:~$ php -a
Interactive mode enabled

php > $u = parse_url("https://www.example.com\\.google.com");
php > print_r($u);
Array
(
    [scheme] => https
    [host] => www.example.com\.google.com
)

Expected result:
----------------
The expected behavior is that invalid characters (including backslashes) will cause parse_url to return FALSE as with any invalid URL.


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2017-05-15 23:56 UTC] requinix@php.net
-Status: Open +Status: Not a bug
 [2017-05-15 23:56 UTC] requinix@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

> This function is not meant to validate the given URL, it only breaks it up into
> the above listed parts.
 [2017-05-16 01:31 UTC] bj dot cardon at gmail dot com
I don't need "validation". I need a function that claims to parse a URL to not parse the URL in a nonsensical way.

The documentation also says:

> Invalid characters are replaced by _

Which is not happening with the invalid character of '\'.
 [2017-05-16 01:37 UTC] bj dot cardon at gmail dot com
On a side note, perhaps returning FALSE is not the actual expected behavior, however I don't think the behavior as it currently exists is ideal for this scenario.
 [2017-05-16 01:59 UTC] requinix@php.net
-Status: Not a bug +Status: Open -Type: Bug +Type: Documentation Problem
 [2017-05-16 01:59 UTC] requinix@php.net
The underscores are only applied to control characters - guaranteed to be invalid everywhere. The docs should clarify that "invalid" does not consider what is allowed in each component.

> not parse the URL in a nonsensical way
parse_url tries to break the string into pieces in the most reasonable way it can figure. Mostly based on the presence of delimiters. Backslashes don't have significance, unlike : or / or ?, so they're ignored.

If you think that a "nonsensical way" is parsing a string without validation then a "sensical way" would be parsing it *with* validation, and parse_url is only designed to do half of that.

Parsing with validation is trivial:

function parse_valid_url($url, $component = -1) {
  return filter_var($url, FILTER_VALIDATE_URL) ? parse_url($url, $component) : false;
}
 [2017-05-16 02:25 UTC] bj dot cardon at gmail dot com
To be fair, I was thinking "sensible" would be to interpret the parts in the same way an error-correcting browser might interpret them. Otherwise, I'm not really sure what the purpose of a function like parse_url would be (since it couldn't even be used to necessarily get accurate URL parts and perform your own sanity validations).

I would agree that at the very least the documentation should be more clear, and you are correct that filter_var already provides a close enough approximation of what I'm looking for.
 [2021-09-24 16:12 UTC] cmb@php.net
-Status: Open +Status: Verified -Assigned To: +Assigned To: cmb
 [2021-09-24 16:12 UTC] cmb@php.net
> To be fair, I was thinking "sensible" would be to interpret the
> parts in the same way an error-correcting browser might interpret
> them.

I agree that it would be nice to have such function, but it is
important to note that parse_url() also tries to parse incomplete
URLs, what browsers usually don't do.
 [2021-09-24 16:16 UTC] git@php.net
Automatic comment on behalf of cmb69
Revision: https://github.com/php/doc-en/commit/e95ae8d94cf549dc125b9fc75ece4d38a648a29c
Log: Fix #74599: parse_url allows bad characters in the common name
 [2021-09-24 16:16 UTC] git@php.net
-Status: Verified +Status: Closed
 [2021-09-24 16:21 UTC] git@php.net
Automatic comment on behalf of mumumu
Revision: https://github.com/php/doc-ja/commit/1dfcb999008aa9c23271eee95d8a48a63e8a34ac
Log: Fix #74599: parse_url allows bad characters in the common name
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Mar 29 00:01:28 2024 UTC