|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #78420 preg_match failed with Unicode character scripts
Submitted: 2019-08-16 09:51 UTC Modified: 2019-08-24 09:28 UTC
From: noskov dot vlad at gmail dot com Assigned: cmb (profile)
Status: Not a bug Package: PCRE related
PHP Version: 7.4.0beta2 OS:
Private report: No CVE-ID: None
 [2019-08-16 09:51 UTC] noskov dot vlad at gmail dot com
When running preg_match with Unicode character scripts in regular expression it always return 0.

Test script:
var_dump(preg_match('~[\pL\p{Cyrillic}]~u', 'test'));

Expected result:

Actual result:


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2019-08-16 10:04 UTC]
Works with JIT disabled.
 [2019-08-16 10:04 UTC]
The problem is more subtle than this, as the following all work:

var_dump(preg_match('~[\pL\p{Cyrillic}]~u', 'testЛ'));
var_dump(preg_match('~[\pL]~u', 'testЛ'));
var_dump(preg_match('~[\p{Cyrillic}]~u', 'testЛ'));
var_dump(preg_match('~[\pL]~u', 'test'));
var_dump(preg_match('~[\p{Cyrillic}]~u', 'Л'));
var_dump(preg_match('~\pL~u', 'Л'));
var_dump(preg_match('~\p{Cyrillic}~u', 'testЛ'));
var_dump(preg_match('~\pL\p{Cyrillic}~u', 'testЛ'));
 [2019-08-16 11:56 UTC]
The regression has been introduced by updating to PCRE2 10.33[1],
so apparently is an upstream issue, which might have been resolved

[1] <;a=commit;h=aa9433e9286801a6af6d4cee73d9a165a61e0e3b>
 [2019-08-20 14:25 UTC] noskov dot vlad at gmail dot com
PCRE2 bug report:
 [2019-08-24 09:28 UTC]
-Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb
 [2019-08-24 09:28 UTC]
This has been confirmed to be an upstream issue (which has been
already resolved), so I'm closing this ticket.
PHP Copyright © 2001-2022 The PHP Group
All rights reserved.
Last updated: Sun Jan 23 07:03:33 2022 UTC