php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #79178 [parse_url] underscore at end of host
Submitted: 2020-01-27 19:22 UTC Modified: 2021-06-09 15:27 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: progr-d at yandex dot ru Assigned:
Status: Not a bug Package: *URL Functions
PHP Version: 7.3.14 OS: macOS 10.13.6
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: progr-d at yandex dot ru
New email:
PHP Version: OS:

 

 [2020-01-27 19:22 UTC] progr-d at yandex dot ru
Description:
------------
When processing a file through function `file`, I met this feature, which I could not have expected in this place.
Yes, I know that you need to use flag FILE_IGNORE_NEW_LINES, but having received an underscore, for a long time I could not understand where it came from.

Test script:
---------------
var_dump(parse_url('http://yandex.ru' . PHP_EOL, PHP_URL_HOST));

Expected result:
----------------
string(10) "yandex.ru"

or 

Exception

Actual result:
--------------
string(10) "yandex.ru_"

Patches

Pull Requests

Pull requests:

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2020-03-01 00:24 UTC] nawarian at gmail dot com
The following pull request has been associated:

Patch Name: Fix #79178 parse_url check control chars on host
On GitHub:  https://github.com/php/php-src/pull/5225
Patch:      https://github.com/php/php-src/pull/5225.patch
 [2020-03-01 00:31 UTC] nawarian at gmail dot com
While parsing many parts of the url (scheme, user, pass, host, fragment, query, path) php behaves this way: whenever it finds parts of the string being a control character (cctype::iscntrl) it replaces such character with an underscore.

Why? I don't know...

I've submitted a pull request fixing the behaviour mentioned on "host" part only. But such issue will keep happening on every url part I've mentioned before.

I wouldn't mind fixing those as well, but sounds like a bigger change that affects the API.

Thoughts?
 [2021-06-09 15:27 UTC] patrickallaert@php.net
-Status: Open +Status: Not a bug
 [2021-06-09 15:27 UTC] patrickallaert@php.net
The "_" has been used to prevent CR/LF injection issues, see: https://github.com/php/php-src/commit/184323cbe5ca9407acc41f421800df360757c6f6

I don't really know what you should expect from parse_url() provided that you give him an invalid URL.

Expecting "yandex.ru" while the host "yandex.ru\n" has been provided feels wrong to me.

What would you expect from:

var_dump(parse_url("http://yan\ndex.ru\n", PHP_URL_HOST));

for example?

To me, your case and the one above could equally give: bool(false) IMHO.

I am closing this issue (and the related PR) as it isn't really a bug, nor it has evolved in 17 months and there is no clear outcome of what to expect from parsing invalid URLs.

You are very welcome to re-open this issue and/or it's PR if you have more thoughts about it.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Mon Dec 30 14:01:28 2024 UTC