|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #77093 mb_ereg_replace() does not work with SJIS-win(cp932)
Submitted: 2018-11-02 01:32 UTC Modified: 2018-11-05 01:37 UTC
Avg. Score:4.5 ± 0.5
Reproduced:2 of 2 (100.0%)
Same Version:1 (50.0%)
Same OS:2 (100.0%)
From: ryosuke dot kobayashi at fujisystems dot co dot jp Assigned:
Status: Re-Opened Package: mbstring related
PHP Version: 7.2.11 OS: Linux
Private report: No CVE-ID: None
View Add Comment Developer Edit
Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know!
Just going to say 'Me too!'? Don't clutter the database with that please — but make sure to vote on the bug!
Your email address:
Solve the problem:
21 + 30 = ?
Subscribe to this entry?

 [2018-11-02 01:32 UTC] ryosuke dot kobayashi at fujisystems dot co dot jp
mb_ereg_replace() just returns null when given string contains some specific characters in 'SJIS-win' (e.g. 'ⅰ', '伃'...), and it works without output errors. 
Between 0xFA40('ⅰ') and 0xFC4B('黑') causes this bug, imo.

It also happens with mb_ereg_match*().

I confirmed that this happens PHP Version 7.1 or higher. Here's a results I tried.

PHP 7.0.32 with oniguruma 5.9.6  => It works.
PHP 7.0.32 with oniguruma 6.3.0  => It works.
PHP 7.1.23 with oniguruma 5.9.6  => It does not work.
PHP 7.2.11 with oniguruma 6.3.0  => It does not work.

Test script:
function chk($a,$b){
  $hex = hex2bin($s);
    if (!mb_ereg($hex, $hex)){
      echo "$s($hex):NG\n";
echo "cnt:$j\n";


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2018-11-02 07:53 UTC]
-Status: Open +Status: Not a bug
 [2018-11-02 07:53 UTC]
You need to use mb_regex_encoding() to specify the mb_regex encoding.
 [2018-11-02 08:06 UTC]
Mbstring's internal and regex encoding is independent.

Before 5.6/7.0, mbregex's default encoding was EUC-JP which ISO 8859-1 compatible. So it worked in most cases. Since 5.6/7.0, I made default to UTF-8.

Anyway, to make sure correct operations, you'll need to set correct encoding via mb_regex_encoding() for mb_ereg*(). I might have to take a look at where the difference came from, though.

If you notice anything wrong, please let us know.
 [2018-11-02 09:17 UTC] ryosuke dot kobayashi at fujisystems dot co dot jp
Thank you for your reply.

>You need to use mb_regex_encoding() to specify the mb_regex encoding.
Exactry... I forgotted to add this cuz my Environment works on SJIS-win.

I fixed my test code, and tried it here.

Now, the problem has occurd I wanted to point out.

Best regards.
 [2018-11-02 09:54 UTC]
-Status: Not a bug +Status: Re-Opened
 [2018-11-02 09:54 UTC]
<> looks like a bug.
 [2018-11-03 03:34 UTC]
It seems encoding validation is failing somehow and returning FALSE for it.
 [2018-11-03 23:16 UTC]
I briefly checked code. It seems the difference came from supported encoding between mbstring and Onigruma. Mbstring has 'SJIS-win' encoding while Oniguruma has only 'SJIS'. Any SJIS valiants are validated as 'SJIS'.

As a result, Current (newer) code is trying to validate 'SJIS-win' as 'SJIS' which will fail in certain cases.

Following code should be fixed to address this bug. i.e. php_mb_check_encoding() needs 'SJIS-win' from '_php_mb_regex_mbctype2name(MBREX(current_mbctype))' in this case, not 'SJIS'.

	if (!php_mb_check_encoding(
	)) {

Using 'SJIS' as mbregex encoding wouldn't fix issue.
There should be other issue.
 [2018-11-03 23:22 UTC]
I suppose returning exact encoding, i.e. SJIS-win, from _php_mb_regex_mbctype2name(MBREX(current_mbctype)) would fix this bug.
 [2018-11-05 01:13 UTC] ryosuke dot kobayashi at fujisystems dot co dot jp
thanks for the inquiry.

I could understand the reason.

So, are these results come from same reason?
 [2018-11-05 01:37 UTC]
I suppose it would be fixed also, since mb_ereg_replace() (and it's valiants) is returning NULL for invalid encoding.
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Apr 20 17:01:29 2024 UTC