php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #80961 PCRE treats ellipsis character as a letter
Submitted: 2021-04-16 09:14 UTC Modified: 2021-04-16 09:28 UTC
Votes:1
Avg. Score:4.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: f dot sowade at r9e dot de Assigned:
Status: Not a bug Package: PCRE related
PHP Version: 8.0.3 OS: Linux and Mac OS X
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If this is not your bug, you can add a comment by following this link.
If this is your bug, but you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: f dot sowade at r9e dot de
New email:
PHP Version: OS:

 

 [2021-04-16 09:14 UTC] f dot sowade at r9e dot de
Description:
------------
The intl extension reports the Unicode ellipsis character (U+2026) to be a punctuation. PCRE matches the ellipsis when matching letters but not when matching punctuation. So the character class for the ellipsis differs between the intl extension and PCRE. I think that intl is correct here and the character class should be punctuation. But both extensions should at least agree on the same character class.

Test script:
---------------
var_dump(IntlChar::ispunct("\u{2026}")); // true
var_dump(IntlChar::isalpha("\u{2026}")); // false

var_dump(preg_match('/\\p{Z}/', "\u{2026}")); // 0 (but would expect 1)
var_dump(preg_match('/\\p{L}/', "\u{2026}")); // 1 (but would expect 0)

Expected result:
----------------
bool(true)
bool(false)
int(1)
int(0)

Actual result:
--------------
bool(true)
bool(false)
int(0)
int(1)

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2021-04-16 09:19 UTC] f dot sowade at r9e dot de
Sorry - the Test script should have had a P instead of a Z:

var_dump(IntlChar::ispunct("\u{2026}")); // true
var_dump(IntlChar::isalpha("\u{2026}")); // false

var_dump(preg_match('/\\p{P}/', "\u{2026}")); // 0 (but would expect 1)
var_dump(preg_match('/\\p{L}/', "\u{2026}")); // 1 (but would expect 0)
 [2021-04-16 09:28 UTC] nikic@php.net
-Status: Open +Status: Not a bug
 [2021-04-16 09:28 UTC] nikic@php.net
You are missing the /u modifier and seem to have confused \p{P} with \p{Z}. The correct code is:

var_dump(preg_match('/\\p{P}/u', "\u{2026}")); // 1
var_dump(preg_match('/\\p{L}/u', "\u{2026}")); // 0
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Apr 19 08:01:28 2024 UTC