php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #80398 FILTER_FLAG_EMAIL_UNICODE is not working correctly with Unicode "marks"
Submitted: 2020-11-22 09:14 UTC Modified: 2020-11-22 10:23 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:0 of 0 (0.0%)
From: shantanoo at gmail dot com Assigned:
Status: Open Package: Filter related
PHP Version: 7.4.12 OS: OSX 11.0.1
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: shantanoo at gmail dot com
New email:
PHP Version: OS:

 

 [2020-11-22 09:14 UTC] shantanoo at gmail dot com
Description:
------------
म@example.com and मी@example.com are valid emails. But FILTER_FLAG_EMAIL_UNICODE fails for later email.

Test script:
---------------
<?php
$email = 'मी@example.com';
print filter_var($email, FILTER_VALIDATE_EMAIL, FILTER_FLAG_EMAIL_UNICODE);
?>

Expected result:
----------------
Following should be printed.

मी@example.com


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2020-11-22 10:18 UTC] shantanoo at gmail dot com
According to TinoDidrikse@freenode:

<quote>
 https://github.com/php/php-src/blob/07fa13088e1349f4b5a044faeee57f2b34f6b6e4/ext/filter/logical_filters.c#L646 adds \pL\pN to allow for Unicode, for they forgot \pM for combining marks.
</quote>


<quote>
Lack of \pM would also hit European letters like å (which is the decomposed form of å).
</quote>
 [2020-11-22 10:23 UTC] requinix@php.net
-Summary: FILTER_FLAG_EMAIL_UNICODE is not working correctly +Summary: FILTER_FLAG_EMAIL_UNICODE is not working correctly with Unicode "marks" -Package: *Regular Expressions +Package: Filter related
 [2020-11-22 10:23 UTC] requinix@php.net
Correct. The character there is
म U+092E DEVANAGARI LETTER MA
combined with
ी U+0940 DEVANAGARI VOWEL SIGN II

The latter is classified as a mark (Mc), and PHP's regex only allows letters (L) and numbers (N). If I modify the regex to include \pM then the email passes.
https://3v4l.org/duYfX

Can't speak to whether that modification is a valid change. Though I would think so.
 [2020-11-22 12:09 UTC] shantanoo at gmail dot com
Yes, fix is works. Tested with devnagari (मराठी mr, हिन्दी hi, संस्कृत sa languages).
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Dec 13 19:01:30 2024 UTC