|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #74599 parse_url allows bad characters in the common name
Submitted: 2017-05-15 18:51 UTC Modified: 2017-05-16 02:25 UTC
Avg. Score:3.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:0 (0.0%)
Same OS:1 (100.0%)
From: bj dot cardon at gmail dot com Assigned:
Status: Open Package: URL related
PHP Version: 7.0.19 OS: Linux
Private report: No CVE-ID: None
Have you experienced this issue?
Rate the importance of this bug to you:

 [2017-05-15 18:51 UTC] bj dot cardon at gmail dot com
As the title says, the parse_url function allows backslashes to be in the hostname part of a URL and considers it valid. You can see the test script below showing this behavior.

Test script:
bcardon@bcardon-base:~$ php -a
Interactive mode enabled

php > $u = parse_url("\\");
php > print_r($u);
    [scheme] => https
    [host] =>\

Expected result:
The expected behavior is that invalid characters (including backslashes) will cause parse_url to return FALSE as with any invalid URL.


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2017-05-15 23:56 UTC]
-Status: Open +Status: Not a bug
 [2017-05-15 23:56 UTC]
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at and the instructions on how to report
a bug at

> This function is not meant to validate the given URL, it only breaks it up into
> the above listed parts.
 [2017-05-16 01:31 UTC] bj dot cardon at gmail dot com
I don't need "validation". I need a function that claims to parse a URL to not parse the URL in a nonsensical way.

The documentation also says:

> Invalid characters are replaced by _

Which is not happening with the invalid character of '\'.
 [2017-05-16 01:37 UTC] bj dot cardon at gmail dot com
On a side note, perhaps returning FALSE is not the actual expected behavior, however I don't think the behavior as it currently exists is ideal for this scenario.
 [2017-05-16 01:59 UTC]
-Status: Not a bug +Status: Open -Type: Bug +Type: Documentation Problem
 [2017-05-16 01:59 UTC]
The underscores are only applied to control characters - guaranteed to be invalid everywhere. The docs should clarify that "invalid" does not consider what is allowed in each component.

> not parse the URL in a nonsensical way
parse_url tries to break the string into pieces in the most reasonable way it can figure. Mostly based on the presence of delimiters. Backslashes don't have significance, unlike : or / or ?, so they're ignored.

If you think that a "nonsensical way" is parsing a string without validation then a "sensical way" would be parsing it *with* validation, and parse_url is only designed to do half of that.

Parsing with validation is trivial:

function parse_valid_url($url, $component = -1) {
  return filter_var($url, FILTER_VALIDATE_URL) ? parse_url($url, $component) : false;
 [2017-05-16 02:25 UTC] bj dot cardon at gmail dot com
To be fair, I was thinking "sensible" would be to interpret the parts in the same way an error-correcting browser might interpret them. Otherwise, I'm not really sure what the purpose of a function like parse_url would be (since it couldn't even be used to necessarily get accurate URL parts and perform your own sanity validations).

I would agree that at the very least the documentation should be more clear, and you are correct that filter_var already provides a close enough approximation of what I'm looking for.
PHP Copyright © 2001-2021 The PHP Group
All rights reserved.
Last updated: Fri Jul 30 22:01:25 2021 UTC