php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #53474 FILTER_VALIDATE_URL should not fail URL's that use IDN
Submitted: 2010-12-05 00:28 UTC Modified: 2020-11-05 13:03 UTC
Votes:17
Avg. Score:4.0 ± 1.3
Reproduced:13 of 13 (100.0%)
Same Version:9 (69.2%)
Same OS:9 (69.2%)
From: gunther at keryx dot se Assigned:
Status: Open Package: Filter related
PHP Version: Irrelevant OS: All
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If this is not your bug, you can add a comment by following this link.
If this is your bug, but you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: gunther at keryx dot se
New email:
PHP Version: OS:

 

 [2010-12-05 00:28 UTC] gunther at keryx dot se
Description:
------------
If an URL uses IDN it should be considered valid.

The test script outputs three blank lines. It should output the URL's.

Examples from http://mathiasbynens.be/demo/url-regex


Test script:
---------------
<?php
echo filter_var('http://✪df.ws/123', FILTER_VALIDATE_URL);
echo "\n";
echo filter_var('http://⌘.ws/', FILTER_VALIDATE_URL);
echo "\n";
echo filter_var('http://➡.ws/䨹', FILTER_VALIDATE_URL);
echo "\n";



Expected result:
----------------
http://✪df.ws/123
http://⌘.ws/
http://⌘.ws/http://➡.ws/䨹



Actual result:
--------------
\n
\n
\n

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2010-12-05 10:46 UTC] mathias at qiwi dot be
I was trying to post a comment here, but it wouldn’t let me (“ERROR: Please do not 
SPAM our bug system”), probably because of all the “URLs” I was trying to post.

Oh well, here’s the comment I tried to post: 
http://paste.pocoo.org/raw/VBO3Gn0vIipLQPRIgAEB/
 [2010-12-05 19:21 UTC] rasmus@php.net
mathias, you are not testing the current code.  Try it with 5.3.3.
 [2010-12-05 21:12 UTC] gunther at keryx dot se
My initial test was done using PHP 5.3.3

The lack of IDN support is still present in 5.3.3

All of Mathias' additional test cases, but the last one are however handled correctly. I.e. this one is approved:

http://a.b--c.de/

It should fail. That's probably an edge case, though. The lack of IDN support is the worst problem.
 [2010-12-06 10:42 UTC] aharvey@php.net
To be completely clear, filter_var(..., FILTER_VALIDATE_URL) already
handles internationalised domain names in their canonical Punycode
form. I guess what this is really going to boil down to is whether
filter_var() should also support IDN when it's represented in a
Unicode character set, given that's really an intermediate
representation for the user's benefit.

My initial feeling on that is "not by default", but I guess we could
look at a flag for that filter (FILTER_FLAG_IDN_UTF8 or similar) which
enabled support for a particular character set (probably UTF-8).

Thoughts?
 [2010-12-07 01:22 UTC] cataphract@php.net
IDN refers only to the domain name. http://➡.ws/䨹 includes also a non-ascii characters in the path portion.

That part is bogus. http://aaa.ws/䨹 is not a proper unambiguous URL, how 䨹 is encoded depends on the encoding (%E4%A8%B9 for UTF-8, but no standard requires it). As to IDNs, I agree with aharvey, I suppose we could add a new flag, but the current behavior is not exactly a bug. This is more a feature request.
 [2014-04-18 02:24 UTC] scif-1986 at ya dot ru
That bug will become 4 years old soon. Any progress?
 [2015-01-03 15:56 UTC] kassner@php.net
Reproducible on PHP 5.5.20.
 [2020-11-05 13:03 UTC] cmb@php.net
-Type: Bug +Type: Feature/Change Request
 [2020-11-05 13:03 UTC] cmb@php.net
> […] but the current behavior is not exactly a bug. This is more
> a feature request.

That.

As workaround, you can apply idn_to_ascii() manually:
<https://3v4l.org/gae31>.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat May 11 00:01:31 2024 UTC