|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
[2013-11-20 00:01 UTC] danielklein at airpost dot net
Description:
------------
The test script appears to check every byte position within the UTF-8 encoded character and backtrack until a valid starting byte is found, then check it. If you remove the improperly inserted characters the resulting bytes do encode the original character correctly.
Once it has matched the beginning of the string it should then move ahead by a character, not by a byte.
Test script:
---------------
<?php
// Sinhala characters
print(preg_replace('/(?<!ක)/u', '*', 'ක') . "\n"); // Works properly
print(preg_replace('/(?<!ක)/u', '*', 'ම') . "\n"); // Triggers the bug
// English characters
print(preg_replace('/(?<!k)/u', '*', 'k') . "\n"); // Works properly
print(preg_replace('/(?<!k)/u', '*', 'm') . "\n"); // Works properly
?>
Expected result:
----------------
*ක
*ම*
*k
*m*
Actual result:
--------------
*ක
*�*�*�*
*k
*m*
PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
|
|||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Sun Oct 26 13:00:02 2025 UTC |
The bug is also in preg_match_all(). See the following code: <?php preg_match_all('/(?<!ක)/u', 'ම', $matches, PREG_OFFSET_CAPTURE); var_dump($matches); ?> There should only be two matches, not four, (both empty strings) and the second one should have an offset of 3 (bytes).