php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #80090 Incorrect character detection with unicode escape sequence
Submitted: 2020-09-10 23:04 UTC Modified: 2020-09-10 23:44 UTC
From: busuioc dot alexandru at gmail dot com Assigned:
Status: Not a bug Package: PCRE related
PHP Version: 7.4.10 OS: Alpine
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: busuioc dot alexandru at gmail dot com
New email:
PHP Version: OS:

 

 [2020-09-10 23:04 UTC] busuioc dot alexandru at gmail dot com
Description:
------------
PHP 7.4.9 on Alpine image

---------------------
# php -m
[PHP Modules]
apcu
blackfire
Core
ctype
curl
date
dom
fileinfo
filter
ftp
gd
hash
iconv
json
libxml
mbstring
mcrypt
memcached
mongodb
mysqli
mysqlnd
openssl
pcntl
pcre
PDO
pdo_mysql
pdo_sqlite
phalcon
Phar
posix
psr
readline
redis
Reflection
session
SimpleXML
sockets
sodium
SPL
sqlite3
standard
tokenizer
xdebug
xml
xmlreader
xmlwriter
Zend OPcache
zip
zlib

[Zend Modules]
Xdebug
Zend OPcache
blackfire

-------------------------
# php --ri pcre

pcre

PCRE (Perl Compatible Regular Expressions) Support => enabled
PCRE Library Version => 10.34 2019-11-21
PCRE Unicode Version => 12.1.0
PCRE JIT Support => enabled
PCRE JIT Target => x86 64bit (little endian + unaligned)

Directive => Local Value => Master Value
pcre.backtrack_limit => 1000000 => 1000000
pcre.recursion_limit => 100000 => 100000
pcre.jit => 1 => 1


Test script:
---------------
Using '\pL' escape sequence followed by another escape sequence with a script name (e.g. \p{Arabic}) will incorrectly detect some ASCII characters (at least the "s" (the small letter)).


Steps to reproduce:


```
<?php

$pattern1 = '/^[0-9_\p{L}\p{Cyrillic}]+$/u';

var_dump(preg_match($pattern1, 's'));
var_dump(preg_match($pattern1, 'S'));
```

The output is 
```
int(0)
int(1)
```

Removing the "\p{Cyrillic}" from the regex pattern above will make it return the expected output:
```
int(1)
int(1)
```

Also, as a workaround, adding the "a-zA-Z" sequence in regex pattern would solve the problem.


This bug can be checked here as well https://3v4l.org/bqsue 



Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2020-09-10 23:44 UTC] requinix@php.net
-Status: Open +Status: Not a bug
 [2020-09-10 23:44 UTC] requinix@php.net
Upstream bug fixed in PCRE2 10.35.
https://bugs.exim.org/show_bug.cgi?id=2432
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Wed Jan 15 10:01:29 2025 UTC