php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #74599 parse_url allows bad characters in the common name
Submitted: 2017-05-15 18:51 UTC Modified: 2021-09-24 16:12 UTC
Votes:1
Avg. Score:3.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:0 (0.0%)
Same OS:1 (100.0%)
From: bj dot cardon at gmail dot com Assigned: cmb (profile)
Status: Closed Package: URL related
PHP Version: 7.0.19 OS: Linux
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: bj dot cardon at gmail dot com
New email:
PHP Version: OS:

 

 [2017-05-15 18:51 UTC] bj dot cardon at gmail dot com
Description:
------------
As the title says, the parse_url function allows backslashes to be in the hostname part of a URL and considers it valid. You can see the test script below showing this behavior.

Test script:
---------------
bcardon@bcardon-base:~$ php -a
Interactive mode enabled

php > $u = parse_url("https://www.example.com\\.google.com");
php > print_r($u);
Array
(
    [scheme] => https
    [host] => www.example.com\.google.com
)

Expected result:
----------------
The expected behavior is that invalid characters (including backslashes) will cause parse_url to return FALSE as with any invalid URL.


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2017-05-15 23:56 UTC] requinix@php.net
-Status: Open +Status: Not a bug
 [2017-05-15 23:56 UTC] requinix@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

> This function is not meant to validate the given URL, it only breaks it up into
> the above listed parts.
 [2017-05-16 01:31 UTC] bj dot cardon at gmail dot com
I don't need "validation". I need a function that claims to parse a URL to not parse the URL in a nonsensical way.

The documentation also says:

> Invalid characters are replaced by _

Which is not happening with the invalid character of '\'.
 [2017-05-16 01:37 UTC] bj dot cardon at gmail dot com
On a side note, perhaps returning FALSE is not the actual expected behavior, however I don't think the behavior as it currently exists is ideal for this scenario.
 [2017-05-16 01:59 UTC] requinix@php.net
-Status: Not a bug +Status: Open -Type: Bug +Type: Documentation Problem
 [2017-05-16 01:59 UTC] requinix@php.net
The underscores are only applied to control characters - guaranteed to be invalid everywhere. The docs should clarify that "invalid" does not consider what is allowed in each component.

> not parse the URL in a nonsensical way
parse_url tries to break the string into pieces in the most reasonable way it can figure. Mostly based on the presence of delimiters. Backslashes don't have significance, unlike : or / or ?, so they're ignored.

If you think that a "nonsensical way" is parsing a string without validation then a "sensical way" would be parsing it *with* validation, and parse_url is only designed to do half of that.

Parsing with validation is trivial:

function parse_valid_url($url, $component = -1) {
  return filter_var($url, FILTER_VALIDATE_URL) ? parse_url($url, $component) : false;
}
 [2017-05-16 02:25 UTC] bj dot cardon at gmail dot com
To be fair, I was thinking "sensible" would be to interpret the parts in the same way an error-correcting browser might interpret them. Otherwise, I'm not really sure what the purpose of a function like parse_url would be (since it couldn't even be used to necessarily get accurate URL parts and perform your own sanity validations).

I would agree that at the very least the documentation should be more clear, and you are correct that filter_var already provides a close enough approximation of what I'm looking for.
 [2021-09-24 16:12 UTC] cmb@php.net
-Status: Open +Status: Verified -Assigned To: +Assigned To: cmb
 [2021-09-24 16:12 UTC] cmb@php.net
> To be fair, I was thinking "sensible" would be to interpret the
> parts in the same way an error-correcting browser might interpret
> them.

I agree that it would be nice to have such function, but it is
important to note that parse_url() also tries to parse incomplete
URLs, what browsers usually don't do.
 [2021-09-24 16:16 UTC] git@php.net
Automatic comment on behalf of cmb69
Revision: https://github.com/php/doc-en/commit/e95ae8d94cf549dc125b9fc75ece4d38a648a29c
Log: Fix #74599: parse_url allows bad characters in the common name
 [2021-09-24 16:16 UTC] git@php.net
-Status: Verified +Status: Closed
 [2021-09-24 16:21 UTC] git@php.net
Automatic comment on behalf of mumumu
Revision: https://github.com/php/doc-ja/commit/1dfcb999008aa9c23271eee95d8a48a63e8a34ac
Log: Fix #74599: parse_url allows bad characters in the common name
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Sun Jul 06 17:01:33 2025 UTC