|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
[2012-04-20 00:54 UTC] danielklein at airpost dot net
Description:
------------
Named and unnamed captures in both preg_match and preg_match_all (and probably preg_replace and the other PCRE functions too but I haven't tested them all) can capture the wrong number of parentheses if alternation or a zero-or-more quantifier is used.
If the pattern '/(?<b>b)|(?<c>c)|(?<d>d)/' is used to match 'c', both 'b' and 'c' will be set in the results array but 'd' won't be. 'b' should not be set (even to an empty string) as it failed to match anything. However, if it was trying to match '/(?<b>b?)(?<c>c)/' (note: optional 'b' AND mandatory 'c'), 'b' _should_ be set to '' as it's allowed to match a zero-length string. If a match gets tried but it fails and a capture later in the pattern works, the skipped capture should never produce a key in the results array. It should be OK to leave holes in the numbered sequence (e.g. match 0 and 2 but not 1).
Currently, you need to use PREG_OFFSET_CAPTURE and test to see if the key exists, and if it does, test to see if the capture position is -1. If this bug is fixed, capture positions will never be -1 as the key won't exist. Alternatively, an additional flag could be added (e.g. PREG_KEEP_NONMATCHES) to create keys for ALL captures whether used or not (so, in the first pattern above, keys would be created for 'b', 'c' and 'd' in all cases, and if matching the string 'c' the offsets for both 'b' and 'd' would be -1).
In summary, if the pattern '/(?<b>b)|(?<c>c)|(?<d>d)/' is used to match 'c', by default it should only ever create a key for 'c'. If desired, an additional flag could be added so that it creates keys for all captures: 'b', 'c' and 'd'. The current behaviour where it creates a key for 'b' and 'c' but not 'd' should be considered a bug and fixed.
Test script:
---------------
print('<pre>');
$offset = 0;
while (preg_match('/(?:(?<b>b)|(?<c>c)|(?<d>d))(?<e>e)?/', 'cdec', $matches, PREG_OFFSET_CAPTURE, $offset)) {
$offset = $matches[0][1] + strlen($matches[0][0]);
var_export($matches);
print("\n\n");
}
print("****************\n\n");
preg_match_all('/(?:(?<b>b)|(?<c>c)|(?<d>d))(?<e>e)?/', 'cdec', $matches, PREG_OFFSET_CAPTURE | PREG_SET_ORDER);
var_export($matches);
print('</pre>');
Expected result:
----------------
array (
0 =>
array (
0 => 'c',
1 => 0,
),
'c' =>
array (
0 => 'c',
1 => 0,
),
2 =>
array (
0 => 'c',
1 => 0,
),
)
array (
0 =>
array (
0 => 'de',
1 => 1,
),
'd' =>
array (
0 => 'd',
1 => 1,
),
3 =>
array (
0 => 'd',
1 => 1,
),
'e' =>
array (
0 => 'e',
1 => 2,
),
4 =>
array (
0 => 'e',
1 => 2,
),
)
array (
0 =>
array (
0 => 'c',
1 => 3,
),
'c' =>
array (
0 => 'c',
1 => 3,
),
2 =>
array (
0 => 'c',
1 => 3,
),
)
****************
array (
0 =>
array (
0 =>
array (
0 => 'c',
1 => 0,
),
'c' =>
array (
0 => 'c',
1 => 0,
),
2 =>
array (
0 => 'c',
1 => 0,
),
),
1 =>
array (
0 =>
array (
0 => 'de',
1 => 1,
),
'd' =>
array (
0 => 'd',
1 => 1,
),
3 =>
array (
0 => 'd',
1 => 1,
),
'e' =>
array (
0 => 'e',
1 => 2,
),
4 =>
array (
0 => 'e',
1 => 2,
),
),
2 =>
array (
0 =>
array (
0 => 'c',
1 => 3,
),
'c' =>
array (
0 => 'c',
1 => 3,
),
2 =>
array (
0 => 'c',
1 => 3,
),
),
)
Actual result:
--------------
array (
0 =>
array (
0 => 'c',
1 => 0,
),
'b' =>
array (
0 => '',
1 => -1,
),
1 =>
array (
0 => '',
1 => -1,
),
'c' =>
array (
0 => 'c',
1 => 0,
),
2 =>
array (
0 => 'c',
1 => 0,
),
)
array (
0 =>
array (
0 => 'de',
1 => 1,
),
'b' =>
array (
0 => '',
1 => -1,
),
1 =>
array (
0 => '',
1 => -1,
),
'c' =>
array (
0 => '',
1 => -1,
),
2 =>
array (
0 => '',
1 => -1,
),
'd' =>
array (
0 => 'd',
1 => 1,
),
3 =>
array (
0 => 'd',
1 => 1,
),
'e' =>
array (
0 => 'e',
1 => 2,
),
4 =>
array (
0 => 'e',
1 => 2,
),
)
array (
0 =>
array (
0 => 'c',
1 => 3,
),
'b' =>
array (
0 => '',
1 => -1,
),
1 =>
array (
0 => '',
1 => -1,
),
'c' =>
array (
0 => 'c',
1 => 3,
),
2 =>
array (
0 => 'c',
1 => 3,
),
)
****************
array (
0 =>
array (
0 =>
array (
0 => 'c',
1 => 0,
),
'b' =>
array (
0 => '',
1 => -1,
),
1 =>
array (
0 => '',
1 => -1,
),
'c' =>
array (
0 => 'c',
1 => 0,
),
2 =>
array (
0 => 'c',
1 => 0,
),
),
1 =>
array (
0 =>
array (
0 => 'de',
1 => 1,
),
'b' =>
array (
0 => '',
1 => -1,
),
1 =>
array (
0 => '',
1 => -1,
),
'c' =>
array (
0 => '',
1 => -1,
),
2 =>
array (
0 => '',
1 => -1,
),
'd' =>
array (
0 => 'd',
1 => 1,
),
3 =>
array (
0 => 'd',
1 => 1,
),
'e' =>
array (
0 => 'e',
1 => 2,
),
4 =>
array (
0 => 'e',
1 => 2,
),
),
2 =>
array (
0 =>
array (
0 => 'c',
1 => 3,
),
'b' =>
array (
0 => '',
1 => -1,
),
1 =>
array (
0 => '',
1 => -1,
),
'c' =>
array (
0 => 'c',
1 => 3,
),
2 =>
array (
0 => 'c',
1 => 3,
),
),
)
PatchesPull Requests
Pull requests:
HistoryAllCommentsChangesGit/SVN commits
|
|||||||||||||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Thu Oct 30 19:00:01 2025 UTC |
Here is a reproduceable example (PHP 5.3.20 and 5.3.21) where named captures do not return matches at all! I've tested this pattern against the PCRE- Implementation in another language and it worked... <?php $QQ=chr(92) . chr(34); $delimeters = "{}"; $del0 = preg_quote($delimeters{0}); $del1 = preg_quote($delimeters{1}); $tag="language"; $string="fdfdfdfdf{language=1}testhgg"; $preg = "~" . $del0 . $tag . "\s*=\s*(?P<" . "quote>[" . $QQ . "\']*)(? P<att>.*?)(?P=quote)\s*/" . $del1 . "~"; $match=array(); preg_match($preg,$string,$match); echo "<br>string = " . htmlspecialchars($string) . "<br>preg=" . htmlspecialchars($preg) . "<br>match:<pre>";var_dump($match);echo"</pre>"; ?>Output from pcretest: re> /(4)?(2)?\d/g data> 123456 0: 1 0: 23 1: <unset> 2: 2 0: 45 1: 4 0: 6 data> Equivalent PHP code: <?php preg_match_all('/(4)?(2)?\d/', '123456', $matches, PREG_SET_ORDER); var_export($matches); /* Outputs: array ( 0 => array ( 0 => '1', ), 1 => array ( 0 => '23', 1 => '', // This key should not exist as it is unset, not blank in pcretest 2 => '2', ), 2 => array ( 0 => '45', 1 => '4', ), 3 => array ( 0 => '6', ), ) */ ?> pcretest for original report: re> /(?:(?<b>b)|(?<c>c)|(?<d>d))(?<e>e)?/ data> cdec 0: c 1: <unset> 2: c data> re> /(?:(?<b>b)|(?<c>c)|(?<d>d))(?<e>e)?/g data> cdec 0: c 1: <unset> 2: c 0: de 1: <unset> 2: <unset> 3: d 4: e 0: c 1: <unset> 2: c data> 2: c data Showing the difference between a blank capture and no capture (unset): re> /(a)?(b?)(c)?/ data> a 0: a 1: a 2: data> b 0: b 1: <unset> 2: b data> c 0: c 1: <unset> 2: 3: c data> 2: 3: c data