|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #79178 [parse_url] underscore at end of host
Submitted: 2020-01-27 19:22 UTC Modified: 2021-06-09 15:27 UTC
Avg. Score:5.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: progr-d at yandex dot ru Assigned:
Status: Not a bug Package: *URL Functions
PHP Version: 7.3.14 OS: macOS 10.13.6
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
Block user comment
Status: Assign to:
Bug Type:
From: progr-d at yandex dot ru
New email:
PHP Version: OS:


 [2020-01-27 19:22 UTC] progr-d at yandex dot ru
When processing a file through function `file`, I met this feature, which I could not have expected in this place.
Yes, I know that you need to use flag FILE_IGNORE_NEW_LINES, but having received an underscore, for a long time I could not understand where it came from.

Test script:
var_dump(parse_url('' . PHP_EOL, PHP_URL_HOST));

Expected result:
string(10) ""



Actual result:
string(10) "yandex.ru_"


Add a Patch

Pull Requests

Pull requests:

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2020-03-01 00:24 UTC] nawarian at gmail dot com
The following pull request has been associated:

Patch Name: Fix #79178 parse_url check control chars on host
On GitHub:
 [2020-03-01 00:31 UTC] nawarian at gmail dot com
While parsing many parts of the url (scheme, user, pass, host, fragment, query, path) php behaves this way: whenever it finds parts of the string being a control character (cctype::iscntrl) it replaces such character with an underscore.

Why? I don't know...

I've submitted a pull request fixing the behaviour mentioned on "host" part only. But such issue will keep happening on every url part I've mentioned before.

I wouldn't mind fixing those as well, but sounds like a bigger change that affects the API.

 [2021-06-09 15:27 UTC]
-Status: Open +Status: Not a bug
 [2021-06-09 15:27 UTC]
The "_" has been used to prevent CR/LF injection issues, see:

I don't really know what you should expect from parse_url() provided that you give him an invalid URL.

Expecting "" while the host "\n" has been provided feels wrong to me.

What would you expect from:

var_dump(parse_url("http://yan\\n", PHP_URL_HOST));

for example?

To me, your case and the one above could equally give: bool(false) IMHO.

I am closing this issue (and the related PR) as it isn't really a bug, nor it has evolved in 17 months and there is no clear outcome of what to expect from parsing invalid URLs.

You are very welcome to re-open this issue and/or it's PR if you have more thoughts about it.
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sun Jul 14 23:01:30 2024 UTC