php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #73251 [а-я] range contains not all cyrillic letters, filtar_var() fails for valid st
Submitted: 2016-10-05 11:01 UTC Modified: 2016-10-05 11:49 UTC
From: ikonta at yandex dot ru Assigned: cmb (profile)
Status: Not a bug Package: PCRE related
PHP Version: 5.6.26 OS: Linux
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: ikonta at yandex dot ru
New email:
PHP Version: OS:

 

 [2016-10-05 11:01 UTC] ikonta at yandex dot ru
Description:
------------
Modern Russian alphabet contains 33 letters.
Standard UTF8 rage covers 32 of most common, but misses one ('ё').

Standard
U+0410	А
…
U+044F  я

Exceptions:
U+0401	Ё
U+0451	ё
http://www.utf8-chartable.de/unicode-utf8-table.pl?start=1024

[а-я] range should include 'ё' (and [А-Я] — 'Ё') letter, but actually do not.

Test script:
---------------
$ cat cyr_io.php
<?php
$str = "еще";
$valid_string_expr = '/(*UTF8)[^а-я]/';
if (filter_var($str, FILTER_VALIDATE_REGEXP, array("options"=>array("regexp"=>"$valid_string_expr"))) === FALSE)
  echo ("string contains only cyrillic small letters\n");
else
  echo ("string contains NOT only cyrillic small letters\n");

$str = "ещё";
if (filter_var($str, FILTER_VALIDATE_REGEXP, array("options"=>array("regexp"=>"$valid_string_expr"))) === FALSE)
  echo ("string contains only cyrillic small letters\n");
else
  echo ("string contains NOT only cyrillic small letters\n");
?>

Expected result:
----------------
$ php cyr_io.php
string contains only cyrillic small letters
string contains only cyrillic small letters

Actual result:
--------------
$ php cyr_io.php
string contains only cyrillic small letters
string contains NOT only cyrillic small letters

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2016-10-05 11:40 UTC] cmb@php.net
-Status: Open +Status: Verified
 [2016-10-05 11:40 UTC] cmb@php.net
I can confirm the behavior which is, however, not particularly
related to ext/filter, but rather is a general issue of PCRE, see
<https://3v4l.org/oBqFi>.

Quite likely that is an issue of libpcre and not the PHP bindings,
though.
 [2016-10-05 11:49 UTC] cmb@php.net
-Status: Verified +Status: Not a bug -Package: Filter related +Package: PCRE related -Assigned To: +Assigned To: cmb
 [2016-10-05 11:49 UTC] cmb@php.net
Actually, this is rather an issue with the Unicode layout. Ё
(U+0401) comes before А (U+0410), so it is rightly not considered
to be part of the range [А-Я].
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Thu Jul 03 12:01:33 2025 UTC