php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #79257 Duplicate named groups (?J) prefer last alternative even if not matched
Submitted: 2020-02-11 13:11 UTC Modified: 2020-02-11 16:48 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: ptomulik at meil dot pw dot edu dot pl Assigned: nikic (profile)
Status: Closed Package: PCRE related
PHP Version: 7.3+ OS: Debian/Ubuntu
Private report: No CVE-ID: None
 [2020-02-11 13:11 UTC] ptomulik at meil dot pw dot edu dot pl
Description:
------------
It appears, that the "backward incompatible changes" in PHP7.4 - "Regular Expressions", as described here:

  https://www.php.net/manual/en/migration74.incompatible.php

affect named capture groups when duplicate names are enabled ((?J) modifier/PCRE2_DUPMANES config). In 7.4 named capture group always returns the content captured in last alternative, no matter which of the alternatives matched. This differs from 7.3 and renders DUPNAMES feature useless.

Test script:
---------------
<?php
preg_match('/(?J)(?:(?<g>foo)|(?<g>bar))/', 'foo', $matches, PREG_UNMATCHED_AS_NULL);
var_dump($matches);
?>



Expected result:
----------------
array(4) {
  [0]=>
  string(3) "foo"
  ["g"]=>
  string(3) "foo"
  [1]=>
  string(3) "foo"
  [2]=>
  NULL
}

Actual result:
--------------
array(4) {
  [0]=>
  string(3) "foo"
  ["g"]=>
  NULL
  [1]=>
  string(3) "foo"
  [2]=>
  NULL
}


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2020-02-11 13:15 UTC] nikic@php.net
-Status: Open +Status: Verified
 [2020-02-11 15:41 UTC] nikic@php.net
Parts of this actually also seem broken on earlier versions, for example:

preg_match('/(?J)(?:(?<g>foo)|(?<g>bar))(?<h>baz)/', 'foobaz', $matches);
var_dump($matches);

array(6) {
  [0]=>
  string(6) "foobaz"
  ["g"]=>
  string(0) ""
  [1]=>
  string(3) "foo"
  [2]=>
  string(0) ""
  ["h"]=>
  string(3) "baz"
  [3]=>
  string(3) "baz"
}

Key "g" should be "foo" here though.
 [2020-02-11 16:13 UTC] ptomulik at meil dot pw dot edu dot pl
Looks like the actual problem existed in 7.3 and earlier, but it required slight modification to be reproduced. I'll try to update the bug report to take it into account.
 [2020-02-11 16:26 UTC] ptomulik at meil dot pw dot edu dot pl
-Summary: Duplicate named groups ((?J) modifier) broken since 7.4 +Summary: Duplicate named groups (?J) prefer last alternative even if not matched -PHP Version: 7.4.2 +PHP Version: 7.3+
 [2020-02-11 16:26 UTC] ptomulik at meil dot pw dot edu dot pl
This actually happens on all versions, if we consider the following example

<?php
preg_match('/(?J)(?:(?<g>foo)|(?<g>bar))(geez)/', 'foogeez', $matches, PREG_UNMATCHED_AS_NULL);
var_dump($matches);
?>

Expected result:
----------------
array(5) {
  [0]=>
  string(7) "foogeez"
  ["g"]=>
  string(3) "foo"
  [1]=>
  string(3) "foo"
  [2]=>
  NULL
  [3]=>
  string(4) "geez"
}

Actual result:
--------------
array(5) {
  [0]=>
  string(7) "foogeez"
  ["g"]=>
  string(3) "foo"
  [1]=>
  string(3) "foo"
  [2]=>
  NULL
  [3]=>
  string(4) "geez"
}
 [2020-02-11 16:32 UTC] nikic@php.net
Automatic comment on behalf of nikita.ppv@gmail.com
Revision: http://git.php.net/?p=php-src.git;a=commit;h=3a515309631fcacd80ee1f6e247965a0c4626786
Log: Fixed bug #79257
 [2020-02-11 16:32 UTC] nikic@php.net
-Status: Verified +Status: Closed
 [2020-02-11 16:48 UTC] nikic@php.net
-Assigned To: +Assigned To: nikic
 [2020-02-11 16:48 UTC] nikic@php.net
I've fixed this in PHP-7.4. I don't want to fix this on 7.3. It's a long standing issue, and changes to PCRE matches output are risky.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 21 14:01:32 2024 UTC