php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #76221 last optional capturing group is not included
Submitted: 2018-04-14 11:59 UTC Modified: 2018-04-14 14:44 UTC
From: wes dot nospam at nospam dot example dot org Assigned: cmb (profile)
Status: Duplicate Package: PCRE related
PHP Version: 7.2.4 OS:
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: wes dot nospam at nospam dot example dot org
New email:
PHP Version: OS:

 

 [2018-04-14 11:59 UTC] wes dot nospam at nospam dot example dot org
Description:
------------
Not sure this is a bug, adding in case it is.

There are two capturing groups in the regexp,
so I expect $matches to be of length 3.

https://3v4l.org/1ulVZ

hope this helps...

Test script:
---------------
preg_match("/^(aaa)(?:(x)?)/", "aaay", $matches, PREG_UNMATCHED_AS_NULL);
var_dump($matches);

Expected result:
----------------
array(2) {
  [0]=>
  string(3) "aaa"
  [1]=>
  string(3) "aaa"
  [2]=>
  NULL
}

Actual result:
--------------
array(2) {
  [0]=>
  string(3) "aaa"
  [1]=>
  string(3) "aaa"
}

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2018-04-14 14:25 UTC] requinix@php.net
-Summary: last optional capturing group is not included when using PREG_UNMATCHED_AS_NULL +Summary: last optional capturing group is not included
 [2018-04-14 14:25 UTC] requinix@php.net
PCRE2 returns to PHP the highest number of the successful capturing groups. In this pattern, while there are 2 groups the highest to capture was #1. PHP then only fills in the data for the array up to that group and doesn't continue filling in the rest after it. This happens with or without PREG_UNMATCHED_AS_NULL enabled.

https://www.pcre.org/current/doc/html/pcre2api.html#SEC29
> Offset values that correspond to unused subpatterns at the end of the expression are also set to PCRE2_UNSET. For
> example, if the string "abc" is matched against the pattern (abc)(x(yz)?)? subpatterns 2 and 3 are not matched. The
> return from the function is 2, because the highest used capturing subpattern number is 1.

You can see that the option itself works correctly by having a later capturing group match, such as with /^(aaa)(?:(x)?)(y)/. Then [1] and [3] will be set while [2] will be null/an empty string.

The most obvious solution is for PHP to determine the number of capturing groups with pcre2_pattern_info() and the PCRE2_INFO_CAPTURECOUNT option instead of using the return value from pcre2_match/pcre2_jit_match.
https://www.pcre.org/current/doc/html/pcre2api.html#TOC1
 [2018-04-14 14:39 UTC] cmb@php.net
-Status: Open +Status: Duplicate -Assigned To: +Assigned To: cmb
 [2018-04-14 14:39 UTC] cmb@php.net
This is basically a duplicate of bug #73948.  While the solution
to that bug was not clear due to the BC break, since we have
PREG_UNMATCHED_AS_NULL now, we should cater to that.

@requinix: the number of subpatterns is already available in
num_subpats[1].

[1] <https://github.com/php/php-src/blob/PHP-7.1.0/ext/pcre/php_pcre.c#L774>
 [2018-04-14 14:44 UTC] requinix@php.net
I thought this sounded familiar...
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Mon Oct 07 11:01:28 2024 UTC