|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #77937 preg_match failed
Submitted: 2019-04-24 20:33 UTC Modified: 2019-05-07 21:26 UTC
Avg. Score:3.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:0 (0.0%)
Same OS:0 (0.0%)
From: v-altruo at microsoft dot com Assigned: cmb (profile)
Status: Closed Package: *General Issues
PHP Version: 7.3.5RC1 OS: Windows 10
Private report: No CVE-ID: None
 [2019-04-24 20:33 UTC] v-altruo at microsoft dot com
Failed regardless of OPCache being enabled or disabled and if it was TS or NTS. 
Test file location: ext\pcre\tests\locales.phpt

Test script:
setlocale(LC_ALL, 'pt_PT', 'pt', 'pt_PT.ISO8859-1', 'portuguese');
var_dump(preg_match('/^\w{6}$/', 'aאבחיט'));

Expected result:

Actual result:


Add a Patch

Pull Requests

Pull requests:

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2019-04-24 20:45 UTC]
-Status: Open +Status: Not a bug
 [2019-04-24 20:45 UTC]
Last I knew Portuguese does not cover Hebrew characters.
 [2019-04-24 22:20 UTC] a at b dot c dot de
Adding the /u modifier to the pattern would help (assuming the source is encoded in UTF8 - you can't even SAY "aאבחיט" in ISO8859-1).
 [2019-04-24 22:24 UTC] a at b dot c dot de
Incidentally, the test cited in the original report uses the string "aàáçéè", not Hebrew characters.
 [2019-04-25 09:14 UTC]
-Status: Not a bug +Status: Re-Opened -Assigned To: +Assigned To: cmb
 [2019-04-25 09:14 UTC]
Thanks for reporting!  I can reproduce the *test* *failure*. The
problem is that setlocale()[1] claims to support "pt_PT", but
actually it does not.  Actually supported locales would be "pt-PT"
and "portuguese".

I'm not sure yet what to do about this.  Simply fixing the test
case for Windows would be an option, but that would not fix the
underlying issue which may affect existing userland code.

[1] <>
 [2019-04-25 10:03 UTC]
Hmm, yes, it seems Windows will quite happily accept any "language" or "language_country" string regardless of whether either part exists, as long as the language code is 2 or 3 characters.

var_dump(setlocale(LC_ALL, "xjq_ASDF")); // returns xjq_ASDF
var_dump(setlocale(LC_ALL, "0")); // still xjq_ASDF


So for maximum portability it seems you have to list Windows-specific strings before the normal strings. Or at least the codes it accepts before any short ones.

  "Portuguese_Portugal.28591", // windows okay (28591 is the codepage for ISO 8859-1), linux ignored
  "Portuguese_Portugal",       // windows okay, linux ignored
  "Portuguese",                // windows okay, linux ignored
  "pt_PT.ISO8859-1",           // windows ignored (bad codepage), linux okay
  "pt_PT",                     // windows okay (wrong), linux okay
  "pt"                         // windows okay (wrong), linux okay
 [2019-04-25 17:03 UTC]
-Package: PCRE related +Package: *General Issues
 [2019-04-25 17:03 UTC]
This issue is neither directly PCRE nor testing related.  Consider
the following script:

    var_dump(setlocale(LC_ALL, 'pt_PT'));

This outputs on my Windows system:

    string(5) "pt_PT"

The first three lines indicate that pt_PT is properly supported,
but the failing ctype_alpha() shows that it is not really.

The following C program confirms that the issue is not directly
related to PHP:

    #include <stdio.h>
    #include <ctype.h>
    #include <locale.h>

    int main()
        struct lconv *lc1 = localeconv();
        char *loc = setlocale(LC_ALL, "pt_PT");
        struct lconv *lc2 = localeconv();
        int alpha = isalpha(224);
        printf("%s %s %s %d\n", lc1->decimal_point, loc, lc2->decimal_point, alpha);
        return 0;

Outputs on my Windows system (when built with VC15):

    . pt_PT , 0

Again, ctype fails to properly recognize the locale (which is the
reason for the failing test, since PCRE2 calls ctype functions to
build the character tables).

If I build with VC11, I get:

    . (null) . 0

Apparently, AppVeyor behaves either like this, or it properly
recognizes pt_PT for the ctype functions.
 [2019-05-07 21:26 UTC]
I've made a patch[1] which is supposed to be as backward
compatible as reasonably possible.  To ease testing, respective
binary snapshots[2] are available as well.  I hope to get some
feedback on that before proceeding.


[1] <>
[2] <>
 [2019-05-16 09:11 UTC]
The following pull request has been associated:

Patch Name: Fix #77937: preg_match failed
On GitHub:
 [2019-06-11 06:46 UTC]
Automatic comment on behalf of
Log: Fix #77937: preg_match failed
 [2019-06-11 06:46 UTC]
-Status: Re-Opened +Status: Closed
PHP Copyright © 2001-2023 The PHP Group
All rights reserved.
Last updated: Mon Sep 25 14:01:25 2023 UTC