php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #21226 function parse_url() fails
Submitted: 2002-12-27 18:35 UTC Modified: 2002-12-30 10:44 UTC
Votes:1
Avg. Score:3.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: dev at lechat dot org Assigned: iliaa (profile)
Status: Closed Package: *URL Functions
PHP Version: 4.3.0 OS: w2000
Private report: No CVE-ID: None
 [2002-12-27 18:35 UTC] dev at lechat dot org
Script below :

$url="http://user:password@host.dom.gtld:portnumber?foo";
$p_url=parse_url($url);
print_r($p_url);


Fails with following output :

Warning: parse_url(http://user:password@host.dom.gtld:portnumber?foo) [function.parse-url]: Unable to parse url in C:\script\t.php on line 2

This function was working in php 4.2
It does not working with 4.3.0RC4


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2002-12-27 18:47 UTC] dev at lechat dot org
Does not working on 4.3.0RC4 AND 4.3.0 final too
 [2002-12-27 18:49 UTC] edink@php.net
Verified on Windows and Linux ZTS builds.

Assigning to Ilia.
 [2002-12-27 19:00 UTC] edink@php.net
Some more info: port has to be numeric and less than 5 digits long.

The bug part seems to be that parse_url() doesn't recognize the end of port part if '/' is missing. So

$url="http://user:password@host.dom.gtld:80/?foo";

works as expected.


 [2002-12-27 19:04 UTC] dev at lechat dot org
Seems to come from 'port' part of url.

If we consider this :
user:password@host?foo
works fine,

user:password@host:port?foo
fails

but

user:password@host?foo:port

works and returns port number in $p_url[port] but it's not correct
 [2002-12-27 19:21 UTC] dev at lechat dot org
Add / after end of port part is a good solution. Thanks.
Do you consider that it's a bug or parse_url is url RFC compliant ?
 [2002-12-28 00:39 UTC] iliaa@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

Port can only be a numeric number from 0-99999, although in reality the port range is from 1-65535, clearly a non numeric port number is not valid, hence invalidating the passed URL.
The 2nd example is also wrong, without the '/' between the port & the rest of the request the code MUST assume that the following data is part of the port, hence the URL is not valid once again.
This is NOT a bug.
 [2002-12-28 09:10 UTC] dev at lechat dot org
Thank you for your works on my report, but I'm suprised you pass this report as bogus since :
- 'port' was not number in my example, but it was only to be more comprehensive. Warning is the same with a digit port.
- Same example was working successfully without warning in 4.2 and previous.
- parse_url function manual doesn't tell about trailing slashes before path or query part of url (I triple check ;-)
- RFC 2396 (Uniform Resource Identifiers (URI): Generic Syntax) doesn't specify that you *MUST* have / after port part.

So if you consider these points, either you should modify parse_url function or modify parse_url documentation. But please do not just pass report as bogus!!!
At worst put it a 'closed'.

Thanks for your help.
 [2002-12-30 02:03 UTC] jmcastagnetto@php.net
Reopening this bug. A closer look at RFC 2396 indicates that:





"...  This "generic URI" syntax consists of a sequence of four main components:





<scheme>://<authority><path>?<query> ...


.


.. absoluteURI   = scheme ":" ( hier_part | opaque_part )





URI that are hierarchical in nature use the slash "/" character for


separating hierarchical components. ...


...    


hier_part     = ( net_path | abs_path ) [ "?" query ]       


net_path      = "//" authority [ abs_path ]       


abs_path      = "/"  path_segments    





URI that do not make use of the slash "/" character for separating


   hierarchical components are considered opaque by the generic URI


   parser.       





opaque_part   = uric_no_slash *uric       


uric_no_slash = unreserved | escaped | ";" | "?" | ":" | "@" |


                      "&" | "=" | "+" | "$" | ","


..."





Later in section 3.3 of that RFC the syntax of the path component is clarified. Similar clarification is made in section 3.2 on what is considered as a correct authority component. 





Bottomline the $url given by the bug reporter is mostly conformant to being a hierarchical URI in nature, although not the usual case. As section 3.2 that deals w/ the authority component states that:





"... The authority component is preceded by a double slash "//" and is


   terminated by the next slash "/", question-mark "?", or by the end of


   the URI.  Within the authority component, the characters ";", ":",


   "@", "?", and "/" are reserved. ..."





And that is reinforced in the BNF syntax later in the RFC. Not sure if all web servers will interpret correctly a URL w/o a path but w/ a query part immediately after the authority part, in view of the fact that the "/' in the path is usually internally mapped by the server to wherever the physical files are in the filesystem.





The following code works as expected:





$url = "http://user:passwd@www.example.com:8080/foo.php?bar=1&boom=0";


print_r(parse_url($url));





Giving as output:





Array


(


    [scheme] => http


    [host] => www.example.com


    [port] => 8080


    [user] => user


    [pass] => passwd


    [path] => /foo.php


    [query] => bar=1&boom=0


)





Tested w/ current CVS head on a RH Linux 6.1 machine:





$ php_cvs -v


PHP 4.4.0-dev (cli) (built: Dec 27 2002 14:00:56)


Copyright (c) 1997-2002 The PHP Group


Zend Engine v1.4.0, Copyright (c) 1998-2002 Zend Technologies





as well as 4.3.0 (on the same OS)





$ php -v


PHP 4.3.0 (cli) (built: Dec 29 2002 23:59:53)


Copyright (c) 1997-2002 The PHP Group


Zend Engine v1.3.0, Copyright (c) 1998-2002 Zend Technologies
 [2002-12-30 02:31 UTC] pollita@php.net
Sorry, but your problem does not imply a bug in PHP itself.  For a
list of more appropriate places to ask for help using PHP, please
visit http://www.php.net/support.php as this bug system is not the
appropriate forum for asking support questions. 

Thank you for your interest in PHP.

There are two things wrong with your $url value:

1) "portnumber" is not a valid portnumber, this must be a number

2) There must be a "/" between the portnumber and ANYTHING following.  This does also include the "?foo" query string specified in your case where the document requested is the directory index.
 [2002-12-30 04:48 UTC] gp at ccf dot fr
I have the same problem !
The problem is not discussing about port number or RFC compliance, it is just about : This was working in previous version of PHP and since 4.3.0 behavior is not the same !
 [2002-12-30 09:31 UTC] iliaa@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

http://user:password@host.dom.gtld:portnumber?foo
is NOT a valid url because it is missing a path, end of story.
If you try to make a request to a web server without a path you will get an error, same is true for ftp.
 [2002-12-30 10:44 UTC] iliaa@php.net
This bug has been fixed in CVS.

In case this was a PHP problem, snapshots of the sources are packaged
every three hours; this change will be in the next snapshot. You can
grab the snapshot at http://snaps.php.net/.
 
In case this was a documentation problem, the fix will show up soon at
http://www.php.net/manual/.

In case this was a PHP.net website problem, the change will show
up on the PHP.net site and on the mirror sites in short time.
 
Thank you for the report, and for helping us make PHP better.

My thanks to dev@lechat.org for being persistent enough to see this matter through. 
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 16:01:29 2024 UTC