go to bug id or search bugs for
mb_reg should return false like preg_match if the argument is not well-formed byte sequence since Oniguruma doesn't check the validity of encoding. The same rule applies to mb_ereg_replace, mb_ereg_search_init.
$str = "\x80";
false === mb_eregi('.', $str, $matches),
 === $matches
Add a Patch
Add a Pull Request
While we may check encoding validity in functions, it requires CPU resource.
The best approach is to check encoding validity when program accepts inputs. i.e . Users must make sure encoding validity. Otherwise, PHP must check encoding again and again. Therefore, I think it is better to document this behavior in the manual.
We may have unicode support in near future and we should take this approach, so that we don't repeat PHP 6 failure.
I wrote type affinity RFC for PHP
Encoding validation for $_GET/$_POST/etc may be handled by this RFC.
The PR is merged to master. While the type affinity is a nice approach, the potential vulnerability in mb_ereg() still persists. The data source can be arbitrary, which is especially concerning with search/replace functionality. Clearly it's some overhead to check the subject string before, but as the mbstring ereg functions are based on oniguruma - no chance to get it integrated like htmlspecialchars() does. Thus a pre check is the only option. Some smarter check algorithm could be desirable though. But fe another such a case of a maybe a smaller extent is ICU, which recommends to use conversion APIs for longer strings as well.