|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
[2008-07-06 10:17 UTC] hayk at mail dot ru
Description:
------------
Sometimes mb_eregi finds space in the string when string in utf-8 and contains Russian capitalized "Р" (chr(0xD0) . chr(0xA0) in utf-8, it's not English "P").
Reproduce code:
---------------
<?php
mb_internal_encoding('UTF-8');
header('Content-Type: text/html; charset=utf-8');
$x = 'ПРИРОДА'; //$x = chr(0xD0) . chr(0x9F) . chr(0xD0) . chr(0xA0) . chr(0xD0) . chr(0x98) . chr(0xD0) . chr(0xA0) . chr(0xD0) . chr(0x9E) . chr(0xD0) . chr(0x94) . chr(0xD0) . chr(0x90);
$r = mb_eregi('^([^[:space:]]+([[:space:]][ivx]+)?)(.*)$', $x, $m);
print_r($m);
?>
Expected result:
----------------
Array
(
[0] => природа
[1] => природа
[2] =>
[3] =>
)
Actual result:
--------------
Array
(
[0] => ПРИРОДА
[1] => П�
[2] =>
[3] => �ИРОДА
)
PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
|
|||||||||||||||||||||||||||
Copyright © 2001-2026 The PHP GroupAll rights reserved. |
Last updated: Fri May 15 06:00:02 2026 UTC |
mb_internal_encoding() doesn't reset the regex encoding that is settable by mb_regex_encoding() in contrast to the ini setting "mbstring.internal_encoding". This is really confusing and should have been documented somewhere. Try the following code with any one of the numbered portions uncommented to see the difference: <?php // (1) // mb_internal_encodnig('UTF-8'); // (2) // mb_internal_encoding('UTF-8'); mb_regex_encoding('UTF-8'); // (3) // ini_set('mbstring.internal_encoding', 'UTF-8'); var_dump(mb_internal_encoding()); var_dump(mb_regex_encoding()); $x = "\xd0\x9f\xd0\xa0\xd0\x98\xd0\xa0\xd0\x9e\xd0\x94\xd0\x90"; $r = mb_eregi('^([^[:space:]]+([[:space:]][ivx]+)?)(.*)$', $x, $m); var_dump($r, $m); ?>