php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #80645 PHPs behavior of empty string matching is incorrect
Submitted: 2021-01-19 13:03 UTC Modified: 2021-01-19 13:27 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: firas dot dib at gmail dot com Assigned:
Status: Open Package: PCRE related
PHP Version: 8.0.1 OS: n/a
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: firas dot dib at gmail dot com
New email:
PHP Version: OS:

 

 [2021-01-19 13:03 UTC] firas dot dib at gmail dot com
Description:
------------
PHP is currently not matching empty strings in a perl compatible way.

If we use the following regex: /(?<=(\G.{2}))(?!$)/g
and the following string: 123456789

We would get the following results in PCRE1 and PHP (both PHP <7.3 and >=7.3): "12", "45", "78"
However, in PCRE2 (and Perl), we would get:  "12", "34", "56", "78"

This can be verified by using pcre2test and perltest that is bundled with PCRE2.

The changelog for PCRE2 10.32 states the following:

21. In both pcre2test and pcre2_substitute(), with global matching, a pattern
that matched an empty string, but never at the starting match offset, was not
handled in a Perl-compatible way. The pattern /(<?=\G.)/ is an example of such
a pattern. Because \G is in a lookbehind assertion, there has to be a        
"bumpalong" before there can be a match. The automatic "advance by one  
character after an empty string match" rule is therefore inappropriate. A more
complicated algorithm has now been implemented.

I am not sure if this behavior is intentional or not, but figured I'd highlight it to you.

Best,
Firas

Test script:
---------------
<?php
preg_match_all("/(?<=(\G.{2}))(?!$)/", "123456789", $matches);

var_dump($matches);
?>


Expected result:
----------------
"12", "34", "56", "78"

Actual result:
--------------
"12", "45", "78"

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2021-01-19 13:27 UTC] cmb@php.net
Respective commit of PCRE2:
<https://vcs.pcre.org/pcre2?view=revision&revision=955>.
 
PHP Copyright © 2001-2021 The PHP Group
All rights reserved.
Last updated: Thu Jul 29 00:01:25 2021 UTC