php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #76909 preg_match difference between 7.3 and < 7.3
Submitted: 2018-09-20 20:41 UTC Modified: 2018-09-21 11:12 UTC
From: jay at diablomedia dot com Assigned:
Status: Closed Package: PCRE related
PHP Version: 7.3.0RC1 OS: Linux
Private report: No CVE-ID: None
 [2018-09-20 20:41 UTC] jay at diablomedia dot com
Description:
------------
I was testing a fork of Zend Framework 1 that we maintain for our legacy applications against php 7.3.0rc1 and noticed the Zend_Validate component's tests were failing on php 7.3: https://travis-ci.org/diablomedia/zf1-validate/jobs/431170057

In this case, php 7.3 is allowing some hostnames through validation that are being rejected in php < 7.3.

I dug a bit, and was able to reproduce this with a single regular expression that is evaluated differently in php 7.3 than it is in php 5.6 through php 7.2.

Here's a link to 3v4l.org demonstrating the difference (script is in the "Test script" section of this report): https://3v4l.org/DR3S5

Test script:
---------------
<?php

$domainPart = ' domain.com';

$regexChar = '/^[\x{0100}-\x{017f}]{1,63}$/iu';

preg_match($regexChar, $domainPart, $matches);

print_r($matches);

Expected result:
----------------
I would expect the $matches array in the test script to be empty here, similar to all PHP versions before 7.3.

Actual result:
--------------
The regex matches on " domain.com", which it doesn't in earlier versions of PHP.

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2018-09-20 21:08 UTC] requinix@php.net
Looks like a PCRE bug because turning off JIT makes it work correctly again. So that's the workaround.
ini_set("pcre.jit", 0);

PHP 7.3 is currently bundling PCRE 10.31. Version 10.32-RC1 includes a fix for something that looks related.
> 35. In a pattern such as /[^\x{100}-\x{ffff}]*[\x80-\xff]/ which has a repeated
> negative class with no characters less than 0x100 followed by a positive class
> with only characters less than 0x100, the first class was incorrectly being
> auto-possessified, causing incorrect match failures.
https://www.pcre.org/changelog.txt
https://bugs.exim.org/show_bug.cgi?id=2300

While your example doesn't have the negated set, the fix for the bug was very simple so it could affect less complicated situations like yours as well.
 [2018-09-20 21:24 UTC] nikic@php.net
This bug still reproduces on master, which uses PCRE 10.32, so it was not fixed as part of that release.
 [2018-09-20 21:53 UTC] jay at diablomedia dot com
I filed a bug with PCRE: https://bugs.exim.org/show_bug.cgi?id=2321

I was able to confirm that disabling PCRE JIT does fix the issue.
 [2018-09-20 21:56 UTC] cmb@php.net
-Package: *Regular Expressions +Package: PCRE related
 [2018-09-20 23:29 UTC] requinix@php.net
-Status: Open +Status: Not a bug
 [2018-09-20 23:29 UTC] requinix@php.net
Marking NAB and I'll watch what happens with that report.
 [2018-09-21 06:21 UTC] nikic@php.net
-Status: Not a bug +Status: Re-Opened
 [2018-09-21 06:21 UTC] nikic@php.net
Reopening to track the update of the bundled libpcre2.
 [2018-09-21 07:57 UTC] requinix@php.net
For the record, I can reproduce with pcre2test. (The backslash is required to escape the leading space.)

  re> /^[\x{0100}-\x{017f}]{1,63}$/i,utf,jit=0
data> \ domain.com
No match
data>
  re> /^[\x{0100}-\x{017f}]{1,63}$/i,utf,jit=1
data> \ domain.com
 0:  domain.com
 [2018-09-21 13:59 UTC] ab@php.net
Automatic comment on behalf of ab
Revision: http://git.php.net/?p=php-src.git;a=commit;h=7a02ecb7fef994332d83dae56ac5584d536f3de0
Log: Fixed bug #76909 preg_match difference between 7.3 and &lt; 7.3
 [2018-09-21 13:59 UTC] ab@php.net
-Status: Re-Opened +Status: Closed
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Apr 19 18:01:28 2024 UTC