php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #53474 FILTER_VALIDATE_URL should not fail URL's that use IDN
Submitted: 2010-12-05 00:28 UTC Modified: 2020-11-05 13:03 UTC
Votes:17
Avg. Score:4.0 ± 1.3
Reproduced:13 of 13 (100.0%)
Same Version:9 (69.2%)
Same OS:9 (69.2%)
From: gunther at keryx dot se Assigned:
Status: Open Package: Filter related
PHP Version: Irrelevant OS: All
Private report: No CVE-ID: None
 [2010-12-05 00:28 UTC] gunther at keryx dot se
Description:
------------
If an URL uses IDN it should be considered valid.

The test script outputs three blank lines. It should output the URL's.

Examples from http://mathiasbynens.be/demo/url-regex


Test script:
---------------
<?php
echo filter_var('http://✪df.ws/123', FILTER_VALIDATE_URL);
echo "\n";
echo filter_var('http://⌘.ws/', FILTER_VALIDATE_URL);
echo "\n";
echo filter_var('http://➡.ws/䨹', FILTER_VALIDATE_URL);
echo "\n";



Expected result:
----------------
http://✪df.ws/123
http://⌘.ws/
http://⌘.ws/http://➡.ws/䨹



Actual result:
--------------
\n
\n
\n

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2010-12-05 10:46 UTC] mathias at qiwi dot be
I was trying to post a comment here, but it wouldn’t let me (“ERROR: Please do not 
SPAM our bug system”), probably because of all the “URLs” I was trying to post.

Oh well, here’s the comment I tried to post: 
http://paste.pocoo.org/raw/VBO3Gn0vIipLQPRIgAEB/
 [2010-12-05 19:21 UTC] rasmus@php.net
mathias, you are not testing the current code.  Try it with 5.3.3.
 [2010-12-05 21:12 UTC] gunther at keryx dot se
My initial test was done using PHP 5.3.3

The lack of IDN support is still present in 5.3.3

All of Mathias' additional test cases, but the last one are however handled correctly. I.e. this one is approved:

http://a.b--c.de/

It should fail. That's probably an edge case, though. The lack of IDN support is the worst problem.
 [2010-12-06 10:42 UTC] aharvey@php.net
To be completely clear, filter_var(..., FILTER_VALIDATE_URL) already
handles internationalised domain names in their canonical Punycode
form. I guess what this is really going to boil down to is whether
filter_var() should also support IDN when it's represented in a
Unicode character set, given that's really an intermediate
representation for the user's benefit.

My initial feeling on that is "not by default", but I guess we could
look at a flag for that filter (FILTER_FLAG_IDN_UTF8 or similar) which
enabled support for a particular character set (probably UTF-8).

Thoughts?
 [2010-12-07 01:22 UTC] cataphract@php.net
IDN refers only to the domain name. http://➡.ws/䨹 includes also a non-ascii characters in the path portion.

That part is bogus. http://aaa.ws/䨹 is not a proper unambiguous URL, how 䨹 is encoded depends on the encoding (%E4%A8%B9 for UTF-8, but no standard requires it). As to IDNs, I agree with aharvey, I suppose we could add a new flag, but the current behavior is not exactly a bug. This is more a feature request.
 [2014-04-18 02:24 UTC] scif-1986 at ya dot ru
That bug will become 4 years old soon. Any progress?
 [2015-01-03 15:56 UTC] kassner@php.net
Reproducible on PHP 5.5.20.
 [2020-11-05 13:03 UTC] cmb@php.net
-Type: Bug +Type: Feature/Change Request
 [2020-11-05 13:03 UTC] cmb@php.net
> […] but the current behavior is not exactly a bug. This is more
> a feature request.

That.

As workaround, you can apply idn_to_ascii() manually:
<https://3v4l.org/gae31>.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Nov 22 20:01:31 2024 UTC