|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #53474 FILTER_VALIDATE_URL should not fail URL's that use IDN
Submitted: 2010-12-05 00:28 UTC Modified: 2010-12-07 01:22 UTC
Avg. Score:4.4 ± 0.8
Reproduced:13 of 13 (100.0%)
Same Version:9 (69.2%)
Same OS:9 (69.2%)
From: gunther at keryx dot se Assigned:
Status: Open Package: Filter related
PHP Version: Irrelevant OS: All
Private report: No CVE-ID: None
Have you experienced this issue?
Rate the importance of this bug to you:

 [2010-12-05 00:28 UTC] gunther at keryx dot se
If an URL uses IDN it should be considered valid.

The test script outputs three blank lines. It should output the URL's.

Examples from

Test script:
echo filter_var('http://✪', FILTER_VALIDATE_URL);
echo "\n";
echo filter_var('http://⌘.ws/', FILTER_VALIDATE_URL);
echo "\n";
echo filter_var('http://➡.ws/䨹', FILTER_VALIDATE_URL);
echo "\n";

Expected result:

Actual result:


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2010-12-05 10:46 UTC] mathias at qiwi dot be
I was trying to post a comment here, but it wouldn’t let me (“ERROR: Please do not 
SPAM our bug system”), probably because of all the “URLs” I was trying to post.

Oh well, here’s the comment I tried to post:
 [2010-12-05 19:21 UTC]
mathias, you are not testing the current code.  Try it with 5.3.3.
 [2010-12-05 21:12 UTC] gunther at keryx dot se
My initial test was done using PHP 5.3.3

The lack of IDN support is still present in 5.3.3

All of Mathias' additional test cases, but the last one are however handled correctly. I.e. this one is approved:

It should fail. That's probably an edge case, though. The lack of IDN support is the worst problem.
 [2010-12-06 10:42 UTC]
To be completely clear, filter_var(..., FILTER_VALIDATE_URL) already
handles internationalised domain names in their canonical Punycode
form. I guess what this is really going to boil down to is whether
filter_var() should also support IDN when it's represented in a
Unicode character set, given that's really an intermediate
representation for the user's benefit.

My initial feeling on that is "not by default", but I guess we could
look at a flag for that filter (FILTER_FLAG_IDN_UTF8 or similar) which
enabled support for a particular character set (probably UTF-8).

 [2010-12-07 01:22 UTC]
IDN refers only to the domain name. http://➡.ws/䨹 includes also a non-ascii characters in the path portion.

That part is bogus.䨹 is not a proper unambiguous URL, how 䨹 is encoded depends on the encoding (%E4%A8%B9 for UTF-8, but no standard requires it). As to IDNs, I agree with aharvey, I suppose we could add a new flag, but the current behavior is not exactly a bug. This is more a feature request.
 [2014-04-18 02:24 UTC] scif-1986 at ya dot ru
That bug will become 4 years old soon. Any progress?
 [2015-01-03 15:56 UTC]
Reproducible on PHP 5.5.20.
PHP Copyright © 2001-2019 The PHP Group
All rights reserved.
Last updated: Thu Mar 21 05:01:26 2019 UTC