php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #68296 \n in path with parse_url() converts to underscore
Submitted: 2014-10-24 06:14 UTC Modified: 2019-08-05 17:31 UTC
Votes:5
Avg. Score:3.6 ± 0.8
Reproduced:4 of 4 (100.0%)
Same Version:1 (25.0%)
Same OS:0 (0.0%)
From: dave at lyte dot id dot au Assigned:
Status: Closed Package: URL related
PHP Version: 5.4.34 OS: OS X
Private report: No CVE-ID: None
 [2014-10-24 06:14 UTC] dave at lyte dot id dot au
Description:
------------
Basically if you're parsing a random HTML sometimes users will put new lines in the href attribute of a tags, browsers ignore the new line, but PHP converts it to an underscore.

Test script:
---------------
$ php -r 'var_export(parse_url("http://example.com/foo\nbar"));'

Expected result:
----------------
The expected output is:
$ php -r 'var_export(parse_url("http://example.com/foo\nbar"));'
array (
  'scheme' => 'http',
  'host' => 'example.com',
  'path' => '/foobar',
)

Actual result:
--------------
The actual output is:
$ php -r 'var_export(parse_url("http://example.com/foo\nbar"));'
array (
  'scheme' => 'http',
  'host' => 'example.com',
  'path' => '/foo_bar',
)

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2016-01-15 08:11 UTC] simonsimcity at gmail dot com
Is related to #68296
 [2016-01-15 08:11 UTC] simonsimcity at gmail dot com
Sorry, I meant related to #52923
 [2018-01-11 19:01 UTC] gonzad26 at gmail dot com
The problem still persists in php 5.6.
Doing: parse_url($path,PHP_URL_HOST);

if $path has a "\n" parse_url returns and _ where the \n was.
 [2018-01-11 19:05 UTC] spam2 at rhsoft dot net
besides PHP5 is EOL what do you expect?
garbage in, garbage out!

control chars don't belong into a url - it's that simple and if you don't sanitze and validate your input data be happy that someone does

what do you think happens with null chars and freinds if you concat your remote URL based on userinput and obviously don't care about handle untrusted data careful?
 [2018-01-11 20:25 UTC] gonzad26 at gmail dot com
Pardon me, bug still in PHP 7.1.7.
 [2019-08-05 17:31 UTC] cmb@php.net
-Status: Open +Status: Verified -Type: Bug +Type: Documentation Problem
 [2019-08-05 17:31 UTC] cmb@php.net
This is not a bug, but rather deliberate albeit undocumented
behavior, namely that all control characters in a string passed to
parse_url() are replaced by underscores[1].  Therefore I'm
changing to doc problem.

If you want to have that behavior changed, feel free to start the
RFC process[2].

[1] <https://github.com/php/php-src/blob/6326717dccf9e85dc41b3778692b790e2501fb2a/ext/standard/url.c#L59>
[2] <https://wiki.php.net/rfc/howto>
 [2021-09-24 15:56 UTC] git@php.net
Automatic comment on behalf of cmb69
Revision: https://github.com/php/doc-en/commit/416b9e9bdb397c47bfadcf1a11669e0239942848
Log: Fix #68296: \n in path with parse_url() converts to underscore
 [2021-09-24 15:56 UTC] git@php.net
-Status: Verified +Status: Closed
 [2021-09-24 16:10 UTC] git@php.net
Automatic comment on behalf of mumumu
Revision: https://github.com/php/doc-ja/commit/1f5ca0f911fd1797c9398753ba9d24f6e01d38d6
Log: Fix #68296: \n in path with parse_url() converts to underscore
 
PHP Copyright © 2001-2022 The PHP Group
All rights reserved.
Last updated: Mon Dec 05 07:04:49 2022 UTC