|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
[2013-11-20 00:01 UTC] danielklein at airpost dot net
Description:
------------
The test script appears to check every byte position within the UTF-8 encoded character and backtrack until a valid starting byte is found, then check it. If you remove the improperly inserted characters the resulting bytes do encode the original character correctly.
Once it has matched the beginning of the string it should then move ahead by a character, not by a byte.
Test script:
---------------
<?php
// Sinhala characters
print(preg_replace('/(?<!ක)/u', '*', 'ක') . "\n"); // Works properly
print(preg_replace('/(?<!ක)/u', '*', 'ම') . "\n"); // Triggers the bug
// English characters
print(preg_replace('/(?<!k)/u', '*', 'k') . "\n"); // Works properly
print(preg_replace('/(?<!k)/u', '*', 'm') . "\n"); // Works properly
?>
Expected result:
----------------
*ක
*ම*
*k
*m*
Actual result:
--------------
*ක
*�*�*�*
*k
*m*
PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
|
|||||||||||||||||||||||||||
Copyright © 2001-2026 The PHP GroupAll rights reserved. |
Last updated: Sun Jan 18 16:00:01 2026 UTC |
The bug is also in preg_match_all(). See the following code: <?php preg_match_all('/(?<!ක)/u', 'ම', $matches, PREG_OFFSET_CAPTURE); var_dump($matches); ?> There should only be two matches, not four, (both empty strings) and the second one should have an offset of 3 (bytes).