php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #80961 PCRE treats ellipsis character as a letter
Submitted: 2021-04-16 09:14 UTC Modified: 2021-04-16 09:28 UTC
Votes:1
Avg. Score:4.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: f dot sowade at r9e dot de Assigned:
Status: Not a bug Package: PCRE related
PHP Version: 8.0.3 OS: Linux and Mac OS X
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: f dot sowade at r9e dot de
New email:
PHP Version: OS:

 

 [2021-04-16 09:14 UTC] f dot sowade at r9e dot de
Description:
------------
The intl extension reports the Unicode ellipsis character (U+2026) to be a punctuation. PCRE matches the ellipsis when matching letters but not when matching punctuation. So the character class for the ellipsis differs between the intl extension and PCRE. I think that intl is correct here and the character class should be punctuation. But both extensions should at least agree on the same character class.

Test script:
---------------
var_dump(IntlChar::ispunct("\u{2026}")); // true
var_dump(IntlChar::isalpha("\u{2026}")); // false

var_dump(preg_match('/\\p{Z}/', "\u{2026}")); // 0 (but would expect 1)
var_dump(preg_match('/\\p{L}/', "\u{2026}")); // 1 (but would expect 0)

Expected result:
----------------
bool(true)
bool(false)
int(1)
int(0)

Actual result:
--------------
bool(true)
bool(false)
int(0)
int(1)

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2021-04-16 09:19 UTC] f dot sowade at r9e dot de
Sorry - the Test script should have had a P instead of a Z:

var_dump(IntlChar::ispunct("\u{2026}")); // true
var_dump(IntlChar::isalpha("\u{2026}")); // false

var_dump(preg_match('/\\p{P}/', "\u{2026}")); // 0 (but would expect 1)
var_dump(preg_match('/\\p{L}/', "\u{2026}")); // 1 (but would expect 0)
 [2021-04-16 09:28 UTC] nikic@php.net
-Status: Open +Status: Not a bug
 [2021-04-16 09:28 UTC] nikic@php.net
You are missing the /u modifier and seem to have confused \p{P} with \p{Z}. The correct code is:

var_dump(preg_match('/\\p{P}/u', "\u{2026}")); // 1
var_dump(preg_match('/\\p{L}/u', "\u{2026}")); // 0
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Wed Jan 15 12:01:29 2025 UTC