php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #76512 \w no longer includes unicode characters
Submitted: 2018-06-21 13:47 UTC Modified: 2018-06-21 20:48 UTC
Votes:2
Avg. Score:4.0 ± 1.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: leo dot feyer at gmail dot com Assigned: cmb (profile)
Status: Closed Package: PCRE related
PHP Version: 7.3.0alpha2 OS: MacOS
Private report: No CVE-ID: None
 [2018-06-21 13:47 UTC] leo dot feyer at gmail dot com
Description:
------------
The \w escape sequence no longer includes unicode characters in PHP 7.3.

See https://3v4l.org/uJKEC

I don't know if this is intended, but since the change is not mentioned in the UPGRADING file, it might be not.


Test script:
---------------
var_dump(preg_match('/\w/u', 'รค'));

Expected result:
----------------
The test script returns int(1) on PHP <7.3 and int(0) on PHP 7.3.


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2018-06-21 14:19 UTC] cmb@php.net
It seems to me that this PCRE_UCP[1] should actually be PCRE2_UCP.

[1] <https://github.com/php/php-src/blob/5eb1f92f31cafc48384f9096012f421b37f6d425/ext/pcre/php_pcre.c#L676>
 [2018-06-21 15:51 UTC] cmb@php.net
-Status: Open +Status: Analyzed -Assigned To: +Assigned To: cmb
 [2018-06-21 15:51 UTC] cmb@php.net
> It seems to me that this PCRE_UCP[1] should actually be PCRE2_UCP.

Indeed.
 [2018-06-21 17:25 UTC] ab@php.net
That's the correct diagnose, Christoph. It was not intended and is a trivial porting mistake. It affected also ext/pcre/tests/bug52971.phpt which looked like a beaviour change.

Seems there might be also an additional issue with the locale handling on Windows. I'm not sure yet it's about PCRE2 or setlocale(), needs more investigation.

Thanks.
 [2018-06-21 20:47 UTC] cmb@php.net
Automatic comment on behalf of cmbecker69@gmx.de
Revision: http://git.php.net/?p=php-src.git;a=commit;h=71d16feebbb38f69007d8b7bec44afeb916281fb
Log: Fix #76512: \w no longer includes unicode characters
 [2018-06-21 20:47 UTC] cmb@php.net
-Status: Analyzed +Status: Closed
 [2018-06-21 20:48 UTC] cmb@php.net
Thanks, Anatol. I'll try to have a look at the locale stuff on
Windows later.

And of course, thanks, Leo, for reporting this issue!
 
PHP Copyright © 2001-2018 The PHP Group
All rights reserved.
Last updated: Thu Dec 13 21:01:26 2018 UTC