go to bug id or search bugs for
Please see the test case, the expected result and the actual result in the bug report below.
From analyzing the code, I have identified that the error comes from the following line of code (plus similar code in other functions): https://github.com/php/php-src/blob/0787cd60ed3d0c8c8c8ff7e49b9bb3587bf33b64/ext/pcre/php_pcre.c#L891
g_notempty = (offsets == offsets)? PCRE_NOTEMPTY | PCRE_ANCHORED : 0;
/* Advance to the position right after the last full match */
start_offset = offsets;
This part of the code checks whether the match is empty. If the match it empty, it tries to perform another match from the last position of the match with PCRE_NOTEMPTY flag to force the match to be non-empty.
However, it's incorrect to set the PCRE_NOTEMPTY flag by checking the length of the match based on the start and end indices of the match. For the regex in the test case ~(?: |\G)\d\B\K~ (which always results in empty string matches due to \K), the index where we execute the match from (start_offset) and the start index of the match (offsets) are always different. Therefore, the regex actually advances ahead in the string, and setting PCRE_NOTEMPTY in such case rejects valid empty string matches.
I believe (offsets == offsets) should be changed to (offsets == start_offset). If start_offset <= offsets <= offsets is an invariant, then the change should be correct.
Original report and analysis: http://chat.stackoverflow.com/transcript/message/24995536#24995536
$str = "123 a123 1234567 b123 123";
$str = preg_replace('~(?: |\G)\d\B\K~', "*", $str);
1*2*3 a123 1*2*3*4*5*6*7 b123 1*2*3
The expected result (10 matches/replacements) can be observed on
- pcretest with PCRE version 8.35 2014-04-04
- regex101 https://regex101.com/r/oP9mZ7/3
1*23 a123 1*23*45*67 b123 1*23
Add a Patch
Add a Pull Request
Related To: Bug #70233
I can confirm the issue, and that this case would be solved by
your suggestion (checking for offsets == start_offset).
However, that would break some other cases, e.g.
preg_replace('/\b/', '*', '(#11/19/2002#)')
Oh, you are right. I forgot that the exec function bumps the match along automatically, so relying on start_offset is a bad idea.
Looking at the source code of pcretest, it seems that it uses a different option. Can you check whether the option is usable?
4233 if (use_offsets == use_offsets)
4235 if (use_offsets == len) break;
4236 g_notempty = PCRE_NOTEMPTY_ATSTART | PCRE_ANCHORED;
Great, thanks! PCRE_NOTEMPTY_ATSTART is exactly what is needed.
It's available as of PCRE 8.00 (2009-10-19), so maybe we need
to have a workaround for older libpcre, even though this bug would
not be fixed for such versions.
For old version, I guess the bug can be fixed by using PCRE_ANCHORED all time, so that we take full control of the bump-along. We then can rely on (offsets == start_offset) to check that the match has advanced.
As for work-around, I'm not sure whether a general work-around exists. It should probably be handled on case-by-case basis.
> For old version, I guess the bug can be fixed by using
> PCRE_ANCHORED all time, so that we take full control of the
> bump-along. We then can rely on (offsets == start_offset) to
> check that the match has advanced.
I gave that a quick try, and got 35 failing tests. I have not
investigated further, as it seems to me that PCRE_NOTEMPTY_ATSTART
would not have been introduced, if a general fix would have been
I still prefer to use PCRE_NOTEMPTY_ATSTART instead of
PCRE_NOTEMPTY, if it's available, and to simply fall back to
PCRE_NOTEMPTY otherwise. This will leave this bug unresolved for
PCRE >= 7.2 and < 8.0, but any user who is bitten by this may
update their libpcre.
After some internal discussion it turned out that lifting the
requirements to libpcre >= 8.0 is not an option for now.
Automatic comment on behalf of cmb
Log: Fix #70232: Incorrect bump-along behavior with \K and empty string match