php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #60797 parse_url fragments on #[0-9]+;
Submitted: 2012-01-19 00:30 UTC Modified: 2012-01-25 00:03 UTC
From: buccinator at gmail dot com Assigned:
Status: Wont fix Package: URL related
PHP Version: Irrelevant OS:
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: buccinator at gmail dot com
New email:
PHP Version: OS:

 

 [2012-01-19 00:30 UTC] buccinator at gmail dot com
Description:
------------
note: parse_url() will fragment with #[0-9]+; type encodings within the query

Expected result:
----------------
that the filter be less greedy, or the function include this behaviour in the documentation


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2012-01-21 22:11 UTC] frozenfire@php.net
-Status: Open +Status: Feedback
 [2012-01-21 22:11 UTC] frozenfire@php.net
Could you please provide a test script showing the behaviour you're describing. 
Based on what you've written, I do not know what needs to be documented.
 [2012-01-24 23:11 UTC] buccinator at gmail dot com
$url = 
'http://example.com/key1=val1&key2=valsƈs&key3=vals&apos3;#fragment';

echo '<xmp>';
var_dump(parse_url($url));
echo '</xmp>';

actual output:
array(4) {
  ["scheme"]=>
  string(4) "http"
  ["host"]=>
  string(11) "example.com"
  ["path"]=>
  string(21) "/key1=val1&key2=vals&"
  ["fragment"]=>
  string(31) "39;2s&key3=vals&apos;3#fragment"
}

expected output (unless documented):
array(4) {
  ["scheme"]=>
  string(4) "http"
  ["host"]=>
  string(11) "example.com"
  ["path"]=>
  string(21) "/key1=val1&key2=vals&#39;2s&key3=vals&apos;3"
  ["fragment"]=>
  string(31) "fragment"
}
 [2012-01-24 23:11 UTC] buccinator at gmail dot com
-Status: Feedback +Status: Open
 [2012-01-24 23:22 UTC] frozenfire@php.net
Hash characters are not valid within a URL, except to delimit a fragment. RFC 
1738 (http://www.ietf.org/rfc/rfc1738.txt) explicitly indicates under "2.2. URL 
Character Encoding Issues" that the hash character *must* be encoded for the URL 
to be valid.

The parse_url() reference indicates in several places that malformed URLs will 
likely incur issues. As such, I don't feel it's warranted to document each way a 
URL can be malformed.
 [2012-01-24 23:22 UTC] frozenfire@php.net
-Status: Open +Status: Wont fix
 [2012-01-25 00:03 UTC] buccinator at gmail dot com
fair enough.

imho, as parse_url actually splits on fragment (which i would assume isnt 
actually sent in the HTTP request of a URL), i expected parse_url() to work on 
URLs found in the wild (ie. URLs intended for a browsers address bar, xHTML 
display) eg. a URL captured from a page anchor href tag.

but again, i wouldnt have realised that a URL encoded with a HTML entity 
(regardless of proper context) would be considered malformed.
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Mon May 12 04:01:29 2025 UTC