php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #45438 mb_eregi finds space
Submitted: 2008-07-06 10:17 UTC Modified: 2008-08-02 10:23 UTC
From: hayk at mail dot ru Assigned:
Status: Not a bug Package: Documentation problem
PHP Version: 5.2.6 OS: CentOS 5.1 (x64)
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: hayk at mail dot ru
New email:
PHP Version: OS:

 

 [2008-07-06 10:17 UTC] hayk at mail dot ru
Description:
------------
Sometimes mb_eregi finds space in the string when string in utf-8 and contains Russian capitalized "Р" (chr(0xD0) . chr(0xA0) in utf-8, it's not English "P").

Reproduce code:
---------------
<?php
	mb_internal_encoding('UTF-8');
	header('Content-Type: text/html; charset=utf-8');
	$x = '&#1055;&#1056;&#1048;&#1056;&#1054;&#1044;&#1040;'; //$x = chr(0xD0) . chr(0x9F) . chr(0xD0) . chr(0xA0) . chr(0xD0) . chr(0x98) . chr(0xD0) . chr(0xA0) . chr(0xD0) . chr(0x9E) . chr(0xD0) . chr(0x94) . chr(0xD0) . chr(0x90);
	$r = mb_eregi('^([^[:space:]]+([[:space:]][ivx]+)?)(.*)$', $x, $m);
	print_r($m);
?>

Expected result:
----------------
Array
(
    [0] => &#1087;&#1088;&#1080;&#1088;&#1086;&#1076;&#1072;
    [1] => &#1087;&#1088;&#1080;&#1088;&#1086;&#1076;&#1072;
    [2] => 
    [3] => 
)

Actual result:
--------------
Array
(
    [0] => &#1055;&#1056;&#1048;&#1056;&#1054;&#1044;&#1040;
    [1] => &#1055;&#65533;
    [2] => 
    [3] => &#65533;&#1048;&#1056;&#1054;&#1044;&#1040;
)

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2008-08-01 23:23 UTC] moriyoshi@php.net
mb_internal_encoding() doesn't reset the regex encoding that is settable by mb_regex_encoding() in contrast to the ini setting "mbstring.internal_encoding". This is really confusing and should have been documented somewhere.

Try the following code with any one of the numbered portions uncommented to see the difference:

<?php
// (1)
// mb_internal_encodnig('UTF-8');
// (2)
// mb_internal_encoding('UTF-8'); mb_regex_encoding('UTF-8');
// (3)
// ini_set('mbstring.internal_encoding', 'UTF-8');
var_dump(mb_internal_encoding());
var_dump(mb_regex_encoding());
$x = "\xd0\x9f\xd0\xa0\xd0\x98\xd0\xa0\xd0\x9e\xd0\x94\xd0\x90";
$r = mb_eregi('^([^[:space:]]+([[:space:]][ivx]+)?)(.*)$', $x, $m);
var_dump($r, $m);
?>
 [2008-08-02 10:23 UTC] hayk at mail dot ru
Thanks, using mb_regex_encoding() helped.
It's strange that I see this "bug" only at 64bit OS's.
 
PHP Copyright © 2001-2026 The PHP Group
All rights reserved.
Last updated: Fri May 15 06:00:02 2026 UTC