php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #76512 \w no longer includes unicode characters
Submitted: 2018-06-21 13:47 UTC Modified: 2018-06-21 20:48 UTC
Votes:2
Avg. Score:4.0 ± 1.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: leo dot feyer at gmail dot com Assigned: cmb (profile)
Status: Closed Package: PCRE related
PHP Version: 7.3.0alpha2 OS: MacOS
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: leo dot feyer at gmail dot com
New email:
PHP Version: OS:

 

 [2018-06-21 13:47 UTC] leo dot feyer at gmail dot com
Description:
------------
The \w escape sequence no longer includes unicode characters in PHP 7.3.

See https://3v4l.org/uJKEC

I don't know if this is intended, but since the change is not mentioned in the UPGRADING file, it might be not.


Test script:
---------------
var_dump(preg_match('/\w/u', 'รค'));

Expected result:
----------------
The test script returns int(1) on PHP <7.3 and int(0) on PHP 7.3.


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2018-06-21 14:19 UTC] cmb@php.net
It seems to me that this PCRE_UCP[1] should actually be PCRE2_UCP.

[1] <https://github.com/php/php-src/blob/5eb1f92f31cafc48384f9096012f421b37f6d425/ext/pcre/php_pcre.c#L676>
 [2018-06-21 15:51 UTC] cmb@php.net
-Status: Open +Status: Analyzed -Assigned To: +Assigned To: cmb
 [2018-06-21 15:51 UTC] cmb@php.net
> It seems to me that this PCRE_UCP[1] should actually be PCRE2_UCP.

Indeed.
 [2018-06-21 17:25 UTC] ab@php.net
That's the correct diagnose, Christoph. It was not intended and is a trivial porting mistake. It affected also ext/pcre/tests/bug52971.phpt which looked like a beaviour change.

Seems there might be also an additional issue with the locale handling on Windows. I'm not sure yet it's about PCRE2 or setlocale(), needs more investigation.

Thanks.
 [2018-06-21 20:47 UTC] cmb@php.net
Automatic comment on behalf of cmbecker69@gmx.de
Revision: http://git.php.net/?p=php-src.git;a=commit;h=71d16feebbb38f69007d8b7bec44afeb916281fb
Log: Fix #76512: \w no longer includes unicode characters
 [2018-06-21 20:47 UTC] cmb@php.net
-Status: Analyzed +Status: Closed
 [2018-06-21 20:48 UTC] cmb@php.net
Thanks, Anatol. I'll try to have a look at the locale stuff on
Windows later.

And of course, thanks, Leo, for reporting this issue!
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Nov 23 09:01:28 2024 UTC