php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #45438 mb_eregi finds space
Submitted: 2008-07-06 10:17 UTC Modified: 2008-08-02 10:23 UTC
From: hayk at mail dot ru Assigned:
Status: Not a bug Package: Documentation problem
PHP Version: 5.2.6 OS: CentOS 5.1 (x64)
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: hayk at mail dot ru
New email:
PHP Version: OS:

 

 [2008-07-06 10:17 UTC] hayk at mail dot ru
Description:
------------
Sometimes mb_eregi finds space in the string when string in utf-8 and contains Russian capitalized "Р" (chr(0xD0) . chr(0xA0) in utf-8, it's not English "P").

Reproduce code:
---------------
<?php
	mb_internal_encoding('UTF-8');
	header('Content-Type: text/html; charset=utf-8');
	$x = '&#1055;&#1056;&#1048;&#1056;&#1054;&#1044;&#1040;'; //$x = chr(0xD0) . chr(0x9F) . chr(0xD0) . chr(0xA0) . chr(0xD0) . chr(0x98) . chr(0xD0) . chr(0xA0) . chr(0xD0) . chr(0x9E) . chr(0xD0) . chr(0x94) . chr(0xD0) . chr(0x90);
	$r = mb_eregi('^([^[:space:]]+([[:space:]][ivx]+)?)(.*)$', $x, $m);
	print_r($m);
?>

Expected result:
----------------
Array
(
    [0] => &#1087;&#1088;&#1080;&#1088;&#1086;&#1076;&#1072;
    [1] => &#1087;&#1088;&#1080;&#1088;&#1086;&#1076;&#1072;
    [2] => 
    [3] => 
)

Actual result:
--------------
Array
(
    [0] => &#1055;&#1056;&#1048;&#1056;&#1054;&#1044;&#1040;
    [1] => &#1055;&#65533;
    [2] => 
    [3] => &#65533;&#1048;&#1056;&#1054;&#1044;&#1040;
)

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2008-08-01 23:23 UTC] moriyoshi@php.net
mb_internal_encoding() doesn't reset the regex encoding that is settable by mb_regex_encoding() in contrast to the ini setting "mbstring.internal_encoding". This is really confusing and should have been documented somewhere.

Try the following code with any one of the numbered portions uncommented to see the difference:

<?php
// (1)
// mb_internal_encodnig('UTF-8');
// (2)
// mb_internal_encoding('UTF-8'); mb_regex_encoding('UTF-8');
// (3)
// ini_set('mbstring.internal_encoding', 'UTF-8');
var_dump(mb_internal_encoding());
var_dump(mb_regex_encoding());
$x = "\xd0\x9f\xd0\xa0\xd0\x98\xd0\xa0\xd0\x9e\xd0\x94\xd0\x90";
$r = mb_eregi('^([^[:space:]]+([[:space:]][ivx]+)?)(.*)$', $x, $m);
var_dump($r, $m);
?>
 [2008-08-02 10:23 UTC] hayk at mail dot ru
Thanks, using mb_regex_encoding() helped.
It's strange that I see this "bug" only at 64bit OS's.
 
PHP Copyright © 2001-2026 The PHP Group
All rights reserved.
Last updated: Fri May 15 06:00:02 2026 UTC