php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #72244 filter_var does not support unicode for local part of emails
Submitted: 2016-05-19 18:20 UTC Modified: 2016-11-21 14:18 UTC
From: iquito at gmx dot net Assigned: cmb (profile)
Status: Closed Package: Filter related
PHP Version: 7.0.6 OS:
Private report: No CVE-ID: None
 [2016-05-19 18:20 UTC] iquito at gmx dot net
Description:
------------
filter_var is not up-to-date anymore: the local part of an email address can contain any unicode characters as of RFC 6531, and filter_var should confirm such email addresses as valid.

Test script:
---------------
filter_var('bücher@example.net', FILTER_VALIDATE_EMAIL);

=> returns false, even though this is a valid address.

It would also be nice to handle UTF8 domains internally (doing a idn_to_ascii on the domain if necessary, just to check the address, not when returning the address), to automatically handle UTF8 in domain names as well as in the local part, because it is not straightforward that you have to pre-process the email address yourself in that way. Validation should be as easy as possible, and doing the idn_to_ascii in filter_var would not have any drawbacks, while the current implementation has a huge drawback in that you have to split up the email address, process the domain, and reconstruct the address just to pass it to filter_var - and you have to know all that & do it correctly.


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2016-05-20 12:47 UTC] iquito at gmx dot net
As an addition: according to the new RFCs, the domain part can also be unicode. This means unicode should be supported in both local part and domain part of the email address, and the idn_to_ascii should no longer be necessary if filter_var correctly supports the new RFCs.
 [2016-05-20 13:35 UTC] derick@php.net
If this gets added, it should be opt-in with a flag.
 [2016-05-21 14:40 UTC] iquito at gmx dot net
I am all for options, which would be especially nice for email validation, because different applications have different constraints (filter_var also returns false on root@localhost for example, which could be a valid use-case in some applications, but not in others).

But the default should be to validate as permissive as possible/reasonable, and add options to restrict it further for specific applications as selected by the developer. So even if unicode email addresses are still rare right now, there should not be a "default" restriction against such addresses, because they are valid and will be commonplace in a few years. The main point in a basic email validation function is to make sure the email address is in a valid format & safe, according to the currently valid standards. It should not make assumptions on how to further restrict validation, except when the developer chooses this explicitly.
 [2016-11-21 14:18 UTC] cmb@php.net
-Status: Open +Status: Closed -Assigned To: +Assigned To: cmb
 [2016-11-21 14:18 UTC] cmb@php.net
Unicode in the local-part of email addresses is supported through
the FILTER_FLAG_EMAIL_UNICODE flag as of PHP 7.1.0, see
<https://3v4l.org/mHoIk>.

I'm not sure about Unicode support for the domain, but this is
tracked by bug #39469, anyway.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Wed Dec 04 19:01:32 2024 UTC