php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #81332 FILTER_VALIDATE_URL does not validate URNs or general URIs
Submitted: 2021-08-05 07:00 UTC Modified: 2021-08-05 22:02 UTC
From: tamas dot nagy0404 at outlook dot com Assigned:
Status: Open Package: URL related
PHP Version: Irrelevant OS: all
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: tamas dot nagy0404 at outlook dot com
New email:
PHP Version: OS:

 

 [2021-08-05 07:00 UTC] tamas dot nagy0404 at outlook dot com
Description:
------------
Hi,

Looks like the `filter_var('urn:isbn:0451450523', FILTER_VALIDATE_URL);` function does not validate the urn: type of URIs properly.

The "urn:" type of URIs are mentioned by http://www.faqs.org/rfcs/rfc2396.html however the function just marks as invalid whereas it should validate it according to http://www.faqs.org/rfcs/rfc2141.html that is referenced from the RFC2396. 

Related comment: https://www.php.net/manual/en/filter.filters.validate.php#110411

Tests: https://3v4l.org/SOd9X#veol

Test script:
---------------
https://3v4l.org/SOd9X#veol

Expected result:
----------------
I would expect to validate urn: URIs to be validated properly according to http://www.faqs.org/rfcs/rfc2141.html


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2021-08-05 07:09 UTC] requinix@php.net
-Status: Open +Status: Not a bug
 [2021-08-05 07:09 UTC] requinix@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

https://www.php.net/manual/en/filter.filters.validate.php
> Validates value as URL (according to ยป http://www.faqs.org/rfcs/rfc2396),
> optionally with required components. Beware a valid URL may not specify the HTTP
> protocol http:// so further validation may be required to determine the URL uses
> an expected protocol, e.g. ssh:// or mailto:. Note that the function will only
> find ASCII URLs to be valid; internationalized domain names (containing non-
> ASCII characters) will fail.

FILTER_VALIDATE_URL validates URLs, not URIs.
 [2021-08-05 07:23 UTC] tamas dot nagy0404 at outlook dot com
I think it is still a bug (or misdocumentation), you are referring to the http://www.faqs.org/rfcs/rfc2396.html in your manual for this function regardless what it is called which is about URIs, so it should either specify that only URLs are being validated or URIs too.

Also... for example `news:comp.infosystems.www.servers.unix` is not a URL either however it passes the validation regardless.
 [2021-08-05 07:54 UTC] requinix@php.net
> I think it is still a bug (or misdocumentation), you are referring to the
> http://www.faqs.org/rfcs/rfc2396.html in your manual for this function
> regardless what it is called which is about URIs, so it should either specify
> that only URLs are being validated or URIs too.

RFC 2396 defines the hierarchical format of a URL, which is why it's mentioned. Read section 1.2 to understand the difference between a URL and a URN.

> Also... for example `news:comp.infosystems.www.servers.unix` is not a URL either
> however it passes the validation regardless.

Take another look:

Section 1.2
> The term "Uniform Resource Locator" (URL) refers to the subset of URI that
> identify resources via a representation of their primary access mechanism (e.g.,
> their network "location"), rather than identifying the resource by name or by
> some other attribute(s) of that resource.

https://en.wikipedia.org/wiki/URL
> A Uniform Resource Locator (URL), colloquially termed a web address, is a
> reference to a web resource that specifies its location on a computer network
> and a mechanism for retrieving it. A URL is a specific type of Uniform Resource
> Identifier (URI), although many people use the two terms interchangeably. URLs
> occur most commonly to reference web pages (http), but are also used for file
> transfer (ftp), email (mailto), database access (JDBC), and many other
> applications.

So actually yes, it *is* a URL. It may not be the type of URL that most people are familiar with, like those starting with "http(s)" or containing a "www" or ".com", but it still is one.
 [2021-08-05 08:15 UTC] tamas dot nagy0404 at outlook dot com
I see, right, I've also checked in the meanwhile.

Maybe it should be emphasized in the manual that it does *not* validate an URI only URLs from the RFC? It seems like it is not clear for everyone (wasn't for me either, it is now :) ) what is the actual difference between URLs and URIs and/or why it won't validate URIs.

This issue actually came up with a URI validator from a php library that uses this function and actually fails on validating URIs because of misusing the FILTER_VALIDATE_URL function for this purpose.

Also there should be maybe a filter function for URIs such as FILTER_VALIDATE_URI? Should I open a feature request for that?

Sorry for the inconveniences and thank you very much for your explanation.
 [2021-08-05 22:02 UTC] requinix@php.net
-Summary: urn: URIs are not validated properly +Summary: FILTER_VALIDATE_URL does not validate URNs or general URIs -Status: Not a bug +Status: Open -Type: Bug +Type: Documentation Problem
 [2021-08-05 22:02 UTC] requinix@php.net
> Maybe it should be emphasized in the manual that it does *not* validate an URI
> only URLs from the RFC?

Fair.

If you have an idea for what it should say, the documentation is open source and accepts pull requests on GitHub...
https://github.com/php/doc-en/blob/master/reference/filter/constants.xml

> Also there should be maybe a filter function for URIs such as FILTER_VALIDATE_URI?

The problem with URIs is that they're simply <scheme>:<extra> and the extra portion varies: "tel" looks one way and "http" looks another way and "news" is different still, and that means complexity about how to validate various schemes and which ones should even be allowed...

Validating URNs seems more reasonable, but is there a need for that? They aren't even that complex: "urn:" + identifier + ":" + mostly arbitrary characters, and explode() can handle that quite easily. I'd probably use a regular expression for it.

IMO this is one of those situations where PHP *could* do it, but there's a lot of complexity and nuances and not a whole lot of need, so an implementation is probably better left up to the community.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Wed Dec 11 20:01:26 2024 UTC