php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #68296 \n in path with parse_url() converts to underscore
Submitted: 2014-10-24 06:14 UTC Modified: 2019-08-05 17:31 UTC
Votes:5
Avg. Score:3.6 ± 0.8
Reproduced:4 of 4 (100.0%)
Same Version:1 (25.0%)
Same OS:0 (0.0%)
From: dave at lyte dot id dot au Assigned:
Status: Verified Package: URL related
PHP Version: 5.4.34 OS: OS X
Private report: No CVE-ID: None
Have you experienced this issue?
Rate the importance of this bug to you:

 [2014-10-24 06:14 UTC] dave at lyte dot id dot au
Description:
------------
Basically if you're parsing a random HTML sometimes users will put new lines in the href attribute of a tags, browsers ignore the new line, but PHP converts it to an underscore.

Test script:
---------------
$ php -r 'var_export(parse_url("http://example.com/foo\nbar"));'

Expected result:
----------------
The expected output is:
$ php -r 'var_export(parse_url("http://example.com/foo\nbar"));'
array (
  'scheme' => 'http',
  'host' => 'example.com',
  'path' => '/foobar',
)

Actual result:
--------------
The actual output is:
$ php -r 'var_export(parse_url("http://example.com/foo\nbar"));'
array (
  'scheme' => 'http',
  'host' => 'example.com',
  'path' => '/foo_bar',
)

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2016-01-15 08:11 UTC] simonsimcity at gmail dot com
Is related to #68296
 [2016-01-15 08:11 UTC] simonsimcity at gmail dot com
Sorry, I meant related to #52923
 [2018-01-11 19:01 UTC] gonzad26 at gmail dot com
The problem still persists in php 5.6.
Doing: parse_url($path,PHP_URL_HOST);

if $path has a "\n" parse_url returns and _ where the \n was.
 [2018-01-11 19:05 UTC] spam2 at rhsoft dot net
besides PHP5 is EOL what do you expect?
garbage in, garbage out!

control chars don't belong into a url - it's that simple and if you don't sanitze and validate your input data be happy that someone does

what do you think happens with null chars and freinds if you concat your remote URL based on userinput and obviously don't care about handle untrusted data careful?
 [2018-01-11 20:25 UTC] gonzad26 at gmail dot com
Pardon me, bug still in PHP 7.1.7.
 [2019-08-05 17:31 UTC] cmb@php.net
-Status: Open +Status: Verified -Type: Bug +Type: Documentation Problem
 [2019-08-05 17:31 UTC] cmb@php.net
This is not a bug, but rather deliberate albeit undocumented
behavior, namely that all control characters in a string passed to
parse_url() are replaced by underscores[1].  Therefore I'm
changing to doc problem.

If you want to have that behavior changed, feel free to start the
RFC process[2].

[1] <https://github.com/php/php-src/blob/6326717dccf9e85dc41b3778692b790e2501fb2a/ext/standard/url.c#L59>
[2] <https://wiki.php.net/rfc/howto>
 
PHP Copyright © 2001-2019 The PHP Group
All rights reserved.
Last updated: Mon Oct 21 04:01:26 2019 UTC