php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #76221 last optional capturing group is not included
Submitted: 2018-04-14 11:59 UTC Modified: 2018-04-14 14:44 UTC
From: wes dot nospam at nospam dot example dot org Assigned: cmb (profile)
Status: Duplicate Package: PCRE related
PHP Version: 7.2.4 OS:
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: wes dot nospam at nospam dot example dot org
New email:
PHP Version: OS:

 

 [2018-04-14 11:59 UTC] wes dot nospam at nospam dot example dot org
Description:
------------
Not sure this is a bug, adding in case it is.

There are two capturing groups in the regexp,
so I expect $matches to be of length 3.

https://3v4l.org/1ulVZ

hope this helps...

Test script:
---------------
preg_match("/^(aaa)(?:(x)?)/", "aaay", $matches, PREG_UNMATCHED_AS_NULL);
var_dump($matches);

Expected result:
----------------
array(2) {
  [0]=>
  string(3) "aaa"
  [1]=>
  string(3) "aaa"
  [2]=>
  NULL
}

Actual result:
--------------
array(2) {
  [0]=>
  string(3) "aaa"
  [1]=>
  string(3) "aaa"
}

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2018-04-14 14:25 UTC] requinix@php.net
-Summary: last optional capturing group is not included when using PREG_UNMATCHED_AS_NULL +Summary: last optional capturing group is not included
 [2018-04-14 14:25 UTC] requinix@php.net
PCRE2 returns to PHP the highest number of the successful capturing groups. In this pattern, while there are 2 groups the highest to capture was #1. PHP then only fills in the data for the array up to that group and doesn't continue filling in the rest after it. This happens with or without PREG_UNMATCHED_AS_NULL enabled.

https://www.pcre.org/current/doc/html/pcre2api.html#SEC29
> Offset values that correspond to unused subpatterns at the end of the expression are also set to PCRE2_UNSET. For
> example, if the string "abc" is matched against the pattern (abc)(x(yz)?)? subpatterns 2 and 3 are not matched. The
> return from the function is 2, because the highest used capturing subpattern number is 1.

You can see that the option itself works correctly by having a later capturing group match, such as with /^(aaa)(?:(x)?)(y)/. Then [1] and [3] will be set while [2] will be null/an empty string.

The most obvious solution is for PHP to determine the number of capturing groups with pcre2_pattern_info() and the PCRE2_INFO_CAPTURECOUNT option instead of using the return value from pcre2_match/pcre2_jit_match.
https://www.pcre.org/current/doc/html/pcre2api.html#TOC1
 [2018-04-14 14:39 UTC] cmb@php.net
-Status: Open +Status: Duplicate -Assigned To: +Assigned To: cmb
 [2018-04-14 14:39 UTC] cmb@php.net
This is basically a duplicate of bug #73948.  While the solution
to that bug was not clear due to the BC break, since we have
PREG_UNMATCHED_AS_NULL now, we should cater to that.

@requinix: the number of subpatterns is already available in
num_subpats[1].

[1] <https://github.com/php/php-src/blob/PHP-7.1.0/ext/pcre/php_pcre.c#L774>
 [2018-04-14 14:44 UTC] requinix@php.net
I thought this sounded familiar...
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 16:01:29 2024 UTC