PHP :: Bug #69151 :: mb_ereg should reject ill-formed byte sequence

Bug #69151

mb_ereg should reject ill-formed byte sequence

Submitted:

2015-03-01 13:03 UTC

Modified:

2016-07-17 16:30 UTC

Votes:	1
Avg. Score:	3.0 ± 0.0
Reproduced:	0 of 0 (0.0%)

From:

masakielastic at gmail dot com

Assigned:

ab (profile)

Status:

Closed

Package:

mbstring related

PHP Version:

5.6.6

OS:

Mac OS X

Private report:

CVE-ID:

None

View Developer Edit

[2015-03-01 13:03 UTC] masakielastic at gmail dot com

Description:
------------
mb_reg should return false like preg_match if the argument is not well-formed byte sequence since Oniguruma doesn't check the validity of encoding. The same rule applies to mb_ereg_replace, mb_ereg_search_init.

Test script:
---------------
$str = "\x80";
var_dump(
    false === mb_eregi('.', $str, $matches),
    [] === $matches
);

Expected result:
----------------
bool(true)
bool(true)

Actual result:
--------------
bool(false)
bool(false)

Patches

Pull Requests

Pull requests:

Bug #69151 - mb_ereg should reject ill-formed byte sequence (php-src/1131)

History

AllCommentsChangesGit/SVN commitsRelated reports

[2015-03-28 21:01 UTC] yohgaki@php.net

While we may check encoding validity in functions, it requires CPU resource.

The best approach is to check encoding validity when program accepts inputs. i.e . Users must make sure encoding validity. Otherwise, PHP must check encoding again and again. Therefore, I think it is better to document this behavior in the manual.

We may have unicode support in near future and we should take this approach, so that we don't repeat PHP 6 failure.

I wrote type affinity RFC for PHP
https://wiki.php.net/rfc/introduce-type-affinity
Encoding validation for $_GET/$_POST/etc may be handled by this RFC.

[2016-07-17 16:30 UTC] ab@php.net

-Status: Open +Status: Closed -Assigned To: +Assigned To: ab

[2016-07-17 16:30 UTC] ab@php.net

The PR is merged to master. While the type affinity is a nice approach, the potential vulnerability in mb_ereg() still persists. The data source can be arbitrary, which is especially concerning with search/replace functionality. Clearly it's some overhead to check the subject string before, but as the mbstring ereg functions are based on oniguruma - no chance to get it integrated like htmlspecialchars() does. Thus a pre check is the only option. Some smarter check algorithm could be desirable though. But fe another such a case of a maybe a smaller extent is ICU, which recommends to use conversion APIs for longer strings as well.

Thanks.

	php.net \| support \| documentation \| report a bug \| advanced search \| search howto \| statistics \| random bug \| login
go to bug id or search bugs for


Copyright © 2001-2026 The PHP Group All rights reserved.	Last updated: Fri Jun 26 17:00:01 2026 UTC