|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
[2015-08-11 04:57 UTC] nhahtdh at gmail dot com
Description: ------------ Please see the test case, the expected result and the actual result in the bug report below. From analyzing the code, I have identified that the error comes from the following line of code (plus similar code in other functions): https://github.com/php/php-src/blob/0787cd60ed3d0c8c8c8ff7e49b9bb3587bf33b64/ext/pcre/php_pcre.c#L891 g_notempty = (offsets[1] == offsets[0])? PCRE_NOTEMPTY | PCRE_ANCHORED : 0; /* Advance to the position right after the last full match */ start_offset = offsets[1]; This part of the code checks whether the match is empty. If the match it empty, it tries to perform another match from the last position of the match with PCRE_NOTEMPTY flag to force the match to be non-empty. However, it's incorrect to set the PCRE_NOTEMPTY flag by checking the length of the match based on the start and end indices of the match. For the regex in the test case ~(?: |\G)\d\B\K~ (which always results in empty string matches due to \K), the index where we execute the match from (start_offset) and the start index of the match (offsets[0]) are always different. Therefore, the regex actually advances ahead in the string, and setting PCRE_NOTEMPTY in such case rejects valid empty string matches. I believe (offsets[1] == offsets[0]) should be changed to (offsets[1] == start_offset). If start_offset <= offsets[0] <= offsets[1] is an invariant, then the change should be correct. Original report and analysis: http://chat.stackoverflow.com/transcript/message/24995536#24995536 Test script: --------------- <? $str = "123 a123 1234567 b123 123"; $str = preg_replace('~(?: |\G)\d\B\K~', "*", $str); echo $str; ?> Expected result: ---------------- 1*2*3 a123 1*2*3*4*5*6*7 b123 1*2*3 The expected result (10 matches/replacements) can be observed on - pcretest with PCRE version 8.35 2014-04-04 - regex101 https://regex101.com/r/oP9mZ7/3 Actual result: -------------- 1*23 a123 1*23*45*67 b123 1*23 PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
|
|||||||||||||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Fri Oct 24 01:00:02 2025 UTC |
I can confirm the issue, and that this case would be solved by your suggestion (checking for offsets[1] == start_offset). However, that would break some other cases, e.g. preg_replace('/\b/', '*', '(#11/19/2002#)') Expected: string(20) "(#*11*/*19*/*2002*#)" Actual: string(24) "(#**11**/*19**/*2002**#)"