php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #79178 [parse_url] underscore at end of host
Submitted: 2020-01-27 19:22 UTC Modified: 2021-06-09 15:27 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: progr-d at yandex dot ru Assigned:
Status: Not a bug Package: *URL Functions
PHP Version: 7.3.14 OS: macOS 10.13.6
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: progr-d at yandex dot ru
New email:
PHP Version: OS:

 

 [2020-01-27 19:22 UTC] progr-d at yandex dot ru
Description:
------------
When processing a file through function `file`, I met this feature, which I could not have expected in this place.
Yes, I know that you need to use flag FILE_IGNORE_NEW_LINES, but having received an underscore, for a long time I could not understand where it came from.

Test script:
---------------
var_dump(parse_url('http://yandex.ru' . PHP_EOL, PHP_URL_HOST));

Expected result:
----------------
string(10) "yandex.ru"

or 

Exception

Actual result:
--------------
string(10) "yandex.ru_"

Patches

Pull Requests

Pull requests:

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2020-03-01 00:24 UTC] nawarian at gmail dot com
The following pull request has been associated:

Patch Name: Fix #79178 parse_url check control chars on host
On GitHub:  https://github.com/php/php-src/pull/5225
Patch:      https://github.com/php/php-src/pull/5225.patch
 [2020-03-01 00:31 UTC] nawarian at gmail dot com
While parsing many parts of the url (scheme, user, pass, host, fragment, query, path) php behaves this way: whenever it finds parts of the string being a control character (cctype::iscntrl) it replaces such character with an underscore.

Why? I don't know...

I've submitted a pull request fixing the behaviour mentioned on "host" part only. But such issue will keep happening on every url part I've mentioned before.

I wouldn't mind fixing those as well, but sounds like a bigger change that affects the API.

Thoughts?
 [2021-06-09 15:27 UTC] patrickallaert@php.net
-Status: Open +Status: Not a bug
 [2021-06-09 15:27 UTC] patrickallaert@php.net
The "_" has been used to prevent CR/LF injection issues, see: https://github.com/php/php-src/commit/184323cbe5ca9407acc41f421800df360757c6f6

I don't really know what you should expect from parse_url() provided that you give him an invalid URL.

Expecting "yandex.ru" while the host "yandex.ru\n" has been provided feels wrong to me.

What would you expect from:

var_dump(parse_url("http://yan\ndex.ru\n", PHP_URL_HOST));

for example?

To me, your case and the one above could equally give: bool(false) IMHO.

I am closing this issue (and the related PR) as it isn't really a bug, nor it has evolved in 17 months and there is no clear outcome of what to expect from parsing invalid URLs.

You are very welcome to re-open this issue and/or it's PR if you have more thoughts about it.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Dec 10 08:01:27 2024 UTC