php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #80398 FILTER_FLAG_EMAIL_UNICODE is not working correctly with Unicode "marks"
Submitted: 2020-11-22 09:14 UTC Modified: 2020-11-22 10:23 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:0 of 0 (0.0%)
From: shantanoo at gmail dot com Assigned:
Status: Open Package: Filter related
PHP Version: 7.4.12 OS: OSX 11.0.1
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: shantanoo at gmail dot com
New email:
PHP Version: OS:

 

 [2020-11-22 09:14 UTC] shantanoo at gmail dot com
Description:
------------
म@example.com and मी@example.com are valid emails. But FILTER_FLAG_EMAIL_UNICODE fails for later email.

Test script:
---------------
<?php
$email = 'मी@example.com';
print filter_var($email, FILTER_VALIDATE_EMAIL, FILTER_FLAG_EMAIL_UNICODE);
?>

Expected result:
----------------
Following should be printed.

मी@example.com


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2020-11-22 10:18 UTC] shantanoo at gmail dot com
According to TinoDidrikse@freenode:

<quote>
 https://github.com/php/php-src/blob/07fa13088e1349f4b5a044faeee57f2b34f6b6e4/ext/filter/logical_filters.c#L646 adds \pL\pN to allow for Unicode, for they forgot \pM for combining marks.
</quote>


<quote>
Lack of \pM would also hit European letters like å (which is the decomposed form of å).
</quote>
 [2020-11-22 10:23 UTC] requinix@php.net
-Summary: FILTER_FLAG_EMAIL_UNICODE is not working correctly +Summary: FILTER_FLAG_EMAIL_UNICODE is not working correctly with Unicode "marks" -Package: *Regular Expressions +Package: Filter related
 [2020-11-22 10:23 UTC] requinix@php.net
Correct. The character there is
म U+092E DEVANAGARI LETTER MA
combined with
ी U+0940 DEVANAGARI VOWEL SIGN II

The latter is classified as a mark (Mc), and PHP's regex only allows letters (L) and numbers (N). If I modify the regex to include \pM then the email passes.
https://3v4l.org/duYfX

Can't speak to whether that modification is a valid change. Though I would think so.
 [2020-11-22 12:09 UTC] shantanoo at gmail dot com
Yes, fix is works. Tested with devnagari (मराठी mr, हिन्दी hi, संस्कृत sa languages).
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 21 16:01:28 2024 UTC