php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #73251 [а-я] range contains not all cyrillic letters, filtar_var() fails for valid st
Submitted: 2016-10-05 11:01 UTC Modified: 2016-10-05 11:49 UTC
From: ikonta at yandex dot ru Assigned: cmb (profile)
Status: Not a bug Package: PCRE related
PHP Version: 5.6.26 OS: Linux
Private report: No CVE-ID: None
 [2016-10-05 11:01 UTC] ikonta at yandex dot ru
Description:
------------
Modern Russian alphabet contains 33 letters.
Standard UTF8 rage covers 32 of most common, but misses one ('ё').

Standard
U+0410	А
…
U+044F  я

Exceptions:
U+0401	Ё
U+0451	ё
http://www.utf8-chartable.de/unicode-utf8-table.pl?start=1024

[а-я] range should include 'ё' (and [А-Я] — 'Ё') letter, but actually do not.

Test script:
---------------
$ cat cyr_io.php
<?php
$str = "еще";
$valid_string_expr = '/(*UTF8)[^а-я]/';
if (filter_var($str, FILTER_VALIDATE_REGEXP, array("options"=>array("regexp"=>"$valid_string_expr"))) === FALSE)
  echo ("string contains only cyrillic small letters\n");
else
  echo ("string contains NOT only cyrillic small letters\n");

$str = "ещё";
if (filter_var($str, FILTER_VALIDATE_REGEXP, array("options"=>array("regexp"=>"$valid_string_expr"))) === FALSE)
  echo ("string contains only cyrillic small letters\n");
else
  echo ("string contains NOT only cyrillic small letters\n");
?>

Expected result:
----------------
$ php cyr_io.php
string contains only cyrillic small letters
string contains only cyrillic small letters

Actual result:
--------------
$ php cyr_io.php
string contains only cyrillic small letters
string contains NOT only cyrillic small letters

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2016-10-05 11:40 UTC] cmb@php.net
-Status: Open +Status: Verified
 [2016-10-05 11:40 UTC] cmb@php.net
I can confirm the behavior which is, however, not particularly
related to ext/filter, but rather is a general issue of PCRE, see
<https://3v4l.org/oBqFi>.

Quite likely that is an issue of libpcre and not the PHP bindings,
though.
 [2016-10-05 11:49 UTC] cmb@php.net
-Status: Verified +Status: Not a bug -Package: Filter related +Package: PCRE related -Assigned To: +Assigned To: cmb
 [2016-10-05 11:49 UTC] cmb@php.net
Actually, this is rather an issue with the Unicode layout. Ё
(U+0401) comes before А (U+0410), so it is rightly not considered
to be part of the range [А-Я].
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Apr 25 11:01:30 2024 UTC