php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #60797 parse_url fragments on #[0-9]+;
Submitted: 2012-01-19 00:30 UTC Modified: 2012-01-25 00:03 UTC
From: buccinator at gmail dot com Assigned:
Status: Wont fix Package: URL related
PHP Version: Irrelevant OS:
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: buccinator at gmail dot com
New email:
PHP Version: OS:

 

 [2012-01-19 00:30 UTC] buccinator at gmail dot com
Description:
------------
note: parse_url() will fragment with #[0-9]+; type encodings within the query

Expected result:
----------------
that the filter be less greedy, or the function include this behaviour in the documentation


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2012-01-21 22:11 UTC] frozenfire@php.net
-Status: Open +Status: Feedback
 [2012-01-21 22:11 UTC] frozenfire@php.net
Could you please provide a test script showing the behaviour you're describing. 
Based on what you've written, I do not know what needs to be documented.
 [2012-01-24 23:11 UTC] buccinator at gmail dot com
$url = 
'http://example.com/key1=val1&key2=valsƈs&key3=vals&apos3;#fragment';

echo '<xmp>';
var_dump(parse_url($url));
echo '</xmp>';

actual output:
array(4) {
  ["scheme"]=>
  string(4) "http"
  ["host"]=>
  string(11) "example.com"
  ["path"]=>
  string(21) "/key1=val1&key2=vals&"
  ["fragment"]=>
  string(31) "39;2s&key3=vals&apos;3#fragment"
}

expected output (unless documented):
array(4) {
  ["scheme"]=>
  string(4) "http"
  ["host"]=>
  string(11) "example.com"
  ["path"]=>
  string(21) "/key1=val1&key2=vals&#39;2s&key3=vals&apos;3"
  ["fragment"]=>
  string(31) "fragment"
}
 [2012-01-24 23:11 UTC] buccinator at gmail dot com
-Status: Feedback +Status: Open
 [2012-01-24 23:22 UTC] frozenfire@php.net
Hash characters are not valid within a URL, except to delimit a fragment. RFC 
1738 (http://www.ietf.org/rfc/rfc1738.txt) explicitly indicates under "2.2. URL 
Character Encoding Issues" that the hash character *must* be encoded for the URL 
to be valid.

The parse_url() reference indicates in several places that malformed URLs will 
likely incur issues. As such, I don't feel it's warranted to document each way a 
URL can be malformed.
 [2012-01-24 23:22 UTC] frozenfire@php.net
-Status: Open +Status: Wont fix
 [2012-01-25 00:03 UTC] buccinator at gmail dot com
fair enough.

imho, as parse_url actually splits on fragment (which i would assume isnt 
actually sent in the HTTP request of a URL), i expected parse_url() to work on 
URLs found in the wild (ie. URLs intended for a browsers address bar, xHTML 
display) eg. a URL captured from a page anchor href tag.

but again, i wouldnt have realised that a URL encoded with a HTML entity 
(regardless of proper context) would be considered malformed.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 21 17:01:58 2024 UTC