|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #51050 FILTER_VALIDATE_URL accepts invalid URLs
Submitted: 2010-02-15 05:21 UTC Modified: 2010-02-15 10:01 UTC
From: pecoes at gmail dot com Assigned:
Status: Not a bug Package: Filter related
PHP Version: 5.3.1 OS: WinXP
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
Block user comment
Status: Assign to:
Bug Type:
From: pecoes at gmail dot com
New email:
PHP Version: OS:


 [2010-02-15 05:21 UTC] pecoes at gmail dot com
Look at the code and its result. How is that validation?

Reproduce code:
$url = '"><script>alert(\'oops\');</script';
echo '<a href="', filter_var($url, FILTER_VALIDATE_URL), '">test</a>';

Expected result:
<a href="">test</a>

Actual result:
<a href=""><script>alert('oops');</script">test</a>


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2010-02-15 07:00 UTC]
validate != filter. There's nothing wrong in the url syntax so it's passed on. More in the manual:
 [2010-02-15 07:09 UTC] pecoes at gmail dot com
Seriously? Well then it might be a good idea to add a warning label to the documentation, that successful validation does not protect from XSS attacks.
 [2010-02-15 07:14 UTC]
I guess we could state it more strongly somewhere, but it does say:

Validation is used to validate or check if the data meets certain 
qualifications. For example, passing in FILTER_VALIDATE_EMAIL will 
determine if the data is a valid email address, but will not change the 
data itself.

Sanitization will sanitize the data, so it may alter it by removing 
undesired characters. For example, passing in FILTER_SANITIZE_EMAIL 
will remove characters that are inappropriate for an email address to 
contain. That said, it does not validate the data.

And just the name itself. VALIDATE_URL.  There is nothing invalid about 
the URL in your example.  
 [2010-02-15 07:20 UTC] pecoes at gmail dot com
Okay, I accept that the URL I posted, is valid. But it deserves pointing out that valid does NOT mean safe.

Btw FILTER_SANITIZE_URL has the same effect on this URL.
 [2010-02-15 07:26 UTC]
What you are after is a filter for the html-context.  There is nothing 
wrong with your URL.  You only have an issue with it if you use it in 
an HTML context.  It is your target context you should be filtering 
for.  The URL sanitizer is very explicitly documented as:

Remove all characters except letters, digits and $-

Have a look through:

What you are looking for is FILTER_SANITIZE_SPECIAL_CHARS

 [2010-02-15 07:39 UTC] pecoes at gmail dot com
Personally, I don't like sanitizing. I prefer to either accept or reject. I don't think modifying a user's input is a good idea.


Or an optional FILTER_FLAG_DISALLOW_SPECIAL_CHARS, if that's what you prefer...

Because, you know, using URLs in an HTML context is not such an exotic scenario. :0)
 [2010-02-15 07:52 UTC]
But that wouldn't make it safe and as such would be worse than the 
current state because people would think it was.  It takes more than 
just filtering out or encoding certain characters to make a user-
supplied URL safe for direct display.  Generally you need to apply some 
application-level logic to determine which domains and which primitives 
are valid.  For example: javascript:alert(1) is a perfectly valid URL 
that has no special characters in it.  Or file:/// can also cause 
problems.  Even http://localhost/ can cause issues.  There is a long 
list of things that can be problematic if you allow users to supply 
entire URLs.  Usually all you want to accept from users are url 
fragments and you prepend the primitive and base domain and path.
 [2010-02-15 07:58 UTC] pecoes at gmail dot com
Yes. You're perfectly right. Forget that last suggestion. But I still think, it would be helpful to clearly state in the documentation, that valid != safe.
 [2010-02-15 08:29 UTC]
It is hardly within the scope of the manual to cover all possible attack vectors. already states a number of things you should always consider when dealing with user submitted data.

Whether or not something is "safe" is entirely circumstantial and highly dependent on how trustworthy the user is. It's not the manual's job to figure that out, but your job as a programmer.
 [2010-02-15 10:01 UTC] pecoes at gmail dot com
The issue is not, what my job is.

The issue is, how difficult it is, to figure out, what my job is.

I did a bit of googling and apparently there are a number of people who use the URL-filter incorrectly - like I did. Others give detailed examples, how to properly test for validity, without detailing what "validity" means. And those, who are fully aware of this filter's pitfalls, recommend not to bother with it and use a regex instead.

This may end up as one of those things, that those in the know won't use and those who don't know, will shoot themselves in the foot with. Reminds me a bit of safe mode or magic quotes. It's a well-intentioned attempt to simplify things, that makes things more complicated, because it obscures the parts of the problem, it can't solve.

Maybe I'm operating from a sample of 1 here and so I will shut up now, but it did fool me.
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sun Feb 25 23:01:33 2024 UTC