php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #39898 FILTER_VALIDATE_URL validates \r\n\t etc.
Submitted: 2006-12-20 11:41 UTC Modified: 2006-12-20 19:20 UTC
From: soenke dot ruempler at northclick dot de Assigned: iliaa (profile)
Status: Closed Package: Filter related
PHP Version: 5.2.0 OS: Linux
Private report: No CVE-ID: None
 [2006-12-20 11:41 UTC] soenke dot ruempler at northclick dot de
Description:
------------
FILTER_VALIDATE_URL does validate CR, LF and TAB. I don't know if some RFC does allow this theory but practically this makes the URL filter completely unusable.

Additionally it would be nice if the filter was more restrictive by default. Requiring scheme and host part is essential in 99,999999% of use cases. More useful would be flags like FILTER_FLAG_SCHEME_NOT_REQUIRED, FILTER_FLAG_HOST_NOT_REQUIRED ... 

Reproduce code:
---------------
$ php -r "var_dump(filter_var(\"blah\n\n\t\rblub??\", FILTER_VALIDATE_URL));"



Expected result:
----------------
bool(false)


Actual result:
--------------
string(14) "blah

blub??"

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2006-12-20 13:02 UTC] pajoye@php.net
it uses parse_url without concatenating back the result. If we make it work like parse_url, white spaces will be replaced by '_', which is not a good thing either.

However, given the RFC, we should simply ignore them (See "E. Recommendations for Delimiting URI in Context" in URI or URL RFC).

Fix will be commited once we agreed on the best choice.
 [2006-12-20 14:07 UTC] soenke dot ruempler at northclick dot de
Hi Pierre,

I mean, for validation (and not for sanitization) strings with "invalid" characters like newlines should simply not be validated. Regardless what any RFC says or could say - if my application checks an URL provided by user-input and relies on ext/filter, there could be many issues (maybe like passing the "valid" output to critical functions (system, header etcpp.).

FILTER_VALIDATE_URL is not intuitive by now. Every developer expects "blahblub" to be not valid. FILTER_VALIDATE_URL should have sheme and host required by default (and optional flags to turn it OFF), too.
 [2006-12-20 14:13 UTC] pajoye@php.net
"I mean, for validation (and not for sanitization) strings with "invalid" characters like newlines should simply not be validated"

Yes, however the question is whether these characters can be considered as valid and ignored or invalid and returns false. Same applies if they are trailing characters. I'm in favour of returning false except for trailing chars (with trim applied by default).

"FILTER_VALIDATE_URL is not intuitive by now. Every developer expects "blahblub" to be not valid. FILTER_VALIDATE_URL should have sheme and host required by default (and optional flags to turn it OFF), too."

I agree.

 [2006-12-20 15:15 UTC] soenke dot ruempler at northclick dot de
Ok, that sounds good - I extract the following points:

* Don't rely on parse_url()
* URL with special characters is NOT valid
* URL with whitespaces at the start or end is valid and the whitespace is trimmed in the returned string.
* scheme and host are required by default ("blahblub" not valid, at least "asdf://blahblub" needed
* Maybe FLAGS to turn scheme and host validation OFF (if someone finds it useful).
 [2006-12-20 18:12 UTC] pajoye@php.net
Assign to Ilia, he has the time to do it (he is working on it now).

The decision is:
- remove option of having host & scheme optional
- No option to make them optional (what you have is not a url then)
- trim


 [2006-12-20 19:20 UTC] iliaa@php.net
This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.


 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 08:01:29 2024 UTC