php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #69151 mb_ereg should reject ill-formed byte sequence
Submitted: 2015-03-01 13:03 UTC Modified: 2016-07-17 16:30 UTC
Votes:1
Avg. Score:3.0 ± 0.0
Reproduced:0 of 0 (0.0%)
From: masakielastic at gmail dot com Assigned: ab (profile)
Status: Closed Package: mbstring related
PHP Version: 5.6.6 OS: Mac OS X
Private report: No CVE-ID: None
 [2015-03-01 13:03 UTC] masakielastic at gmail dot com
Description:
------------
mb_reg should return false like preg_match if the argument is not well-formed byte sequence since Oniguruma doesn't check the validity of encoding. The same rule applies to mb_ereg_replace, mb_ereg_search_init.

Test script:
---------------
$str = "\x80";
var_dump(
    false === mb_eregi('.', $str, $matches),
    [] === $matches
);

Expected result:
----------------
bool(true)
bool(true)

Actual result:
--------------
bool(false)
bool(false)

Patches

Pull Requests

Pull requests:

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2015-03-28 21:01 UTC] yohgaki@php.net
While we may check encoding validity in functions, it requires CPU resource.

The best approach is to check encoding validity when program accepts inputs. i.e . Users must make sure encoding validity. Otherwise, PHP must check encoding again and again. Therefore, I think it is better to document this behavior in the manual.

We may have unicode support in near future and we should take this approach, so that we don't repeat PHP 6 failure.

I wrote type affinity RFC for PHP
https://wiki.php.net/rfc/introduce-type-affinity
Encoding validation for $_GET/$_POST/etc may be handled by this RFC.
 [2016-07-17 16:30 UTC] ab@php.net
-Status: Open +Status: Closed -Assigned To: +Assigned To: ab
 [2016-07-17 16:30 UTC] ab@php.net
The PR is merged to master. While the type affinity is a nice approach, the potential vulnerability in mb_ereg() still persists. The data source can be arbitrary, which is especially concerning with search/replace functionality. Clearly it's some overhead to check the subject string before, but as the mbstring ereg functions are based on oniguruma - no chance to get it integrated like htmlspecialchars() does. Thus a pre check is the only option. Some smarter check algorithm could be desirable though. But fe another such a case of a maybe a smaller extent is ICU, which recommends to use conversion APIs for longer strings as well.

Thanks.
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Wed Jan 22 19:01:31 2025 UTC