php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #80090 Incorrect character detection with unicode escape sequence
Submitted: 2020-09-10 23:04 UTC Modified: 2020-09-10 23:44 UTC
From: busuioc dot alexandru at gmail dot com Assigned:
Status: Not a bug Package: PCRE related
PHP Version: 7.4.10 OS: Alpine
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If this is not your bug, you can add a comment by following this link.
If this is your bug, but you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: busuioc dot alexandru at gmail dot com
New email:
PHP Version: OS:

 

 [2020-09-10 23:04 UTC] busuioc dot alexandru at gmail dot com
Description:
------------
PHP 7.4.9 on Alpine image

---------------------
# php -m
[PHP Modules]
apcu
blackfire
Core
ctype
curl
date
dom
fileinfo
filter
ftp
gd
hash
iconv
json
libxml
mbstring
mcrypt
memcached
mongodb
mysqli
mysqlnd
openssl
pcntl
pcre
PDO
pdo_mysql
pdo_sqlite
phalcon
Phar
posix
psr
readline
redis
Reflection
session
SimpleXML
sockets
sodium
SPL
sqlite3
standard
tokenizer
xdebug
xml
xmlreader
xmlwriter
Zend OPcache
zip
zlib

[Zend Modules]
Xdebug
Zend OPcache
blackfire

-------------------------
# php --ri pcre

pcre

PCRE (Perl Compatible Regular Expressions) Support => enabled
PCRE Library Version => 10.34 2019-11-21
PCRE Unicode Version => 12.1.0
PCRE JIT Support => enabled
PCRE JIT Target => x86 64bit (little endian + unaligned)

Directive => Local Value => Master Value
pcre.backtrack_limit => 1000000 => 1000000
pcre.recursion_limit => 100000 => 100000
pcre.jit => 1 => 1


Test script:
---------------
Using '\pL' escape sequence followed by another escape sequence with a script name (e.g. \p{Arabic}) will incorrectly detect some ASCII characters (at least the "s" (the small letter)).


Steps to reproduce:


```
<?php

$pattern1 = '/^[0-9_\p{L}\p{Cyrillic}]+$/u';

var_dump(preg_match($pattern1, 's'));
var_dump(preg_match($pattern1, 'S'));
```

The output is 
```
int(0)
int(1)
```

Removing the "\p{Cyrillic}" from the regex pattern above will make it return the expected output:
```
int(1)
int(1)
```

Also, as a workaround, adding the "a-zA-Z" sequence in regex pattern would solve the problem.


This bug can be checked here as well https://3v4l.org/bqsue 



Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2020-09-10 23:44 UTC] requinix@php.net
-Status: Open +Status: Not a bug
 [2020-09-10 23:44 UTC] requinix@php.net
Upstream bug fixed in PCRE2 10.35.
https://bugs.exim.org/show_bug.cgi?id=2432
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Apr 16 05:01:29 2024 UTC