php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #73088 str_check_encoding and str_scrub
Submitted: 2016-09-15 10:26 UTC Modified: 2017-01-08 17:23 UTC
From: masakielastic at gmail dot com Assigned:
Status: Wont fix Package: mbstring related
PHP Version: Next Minor Version OS:
Private report: No CVE-ID: None
 [2016-09-15 10:26 UTC] masakielastic at gmail dot com
Description:
------------
htmlspecialchars have an ability to check character encoding and replace ill-formed byte sequences (ENT_SUBSTITUTE) since PHP 5.4. The ability is useful for not only escaping special chars but general. Mbstring have same feture such as mb_check_encoding and mb_scrub (PHP 7.2 dev).

bool str_check_encoding ([ string $str[,string $encoding = ini_get("default_charset") ]] )
bool str_scrub ([ string $str[,string $encoding = ini_get("default_charset") ]] )


Patches

Pull Requests

Pull requests:

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2016-09-15 10:28 UTC] cmb@php.net
-Summary: mb_at and mb_codepoint_at +Summary: mb_check_encoding and mb_scrub
 [2016-09-15 10:33 UTC] masakielastic at gmail dot com
-Summary: mb_check_encoding and mb_scrub +Summary: str_check_encoding and str_scrub
 [2016-09-15 10:33 UTC] masakielastic at gmail dot com
fixed title.
 [2016-09-15 10:43 UTC] nikic@php.net
As these functions deal with multibyte strings, they naturally belong into the mbstring extension, which, as you mention, already has analogous functions. Why would these need to be mirrored into ext/standard?
 [2016-09-19 21:25 UTC] masakielastic at gmail dot com
@nikic

The one of target is Symfony's polyfill-mbstring and Drupal adopt them.

https://github.com/symfony/polyfill
https://api.drupal.org/api/drupal/vendor!symfony!polyfill-mbstring!Mbstring.php/class/Mbstring/8.2.x

Algthouh mb_strlen and mb_substr can properly handle ill-formed byte sequences, PCRE's functions iconv_strlen/iconv_substr can't handle them. These functions are used for emulating mbstring's functions. grapheme_strlen and grapheme_substr also cant't handle them.

$str = "ab\x80cde";

var_dump(
    6 === mb_strlen($str, 'UTF-8'),
    false === preg_match_all('/./su', $str),
    false === iconv_strlen($str, 'UTF-8'),
    NULL === grapheme_strlen($str)
);

iconv cant't handle ill-formed byte properly even in 2016.

https://sourceware.org/bugzilla/show_bug.cgi?id=2373

I will open other requests for adding encoding option to substr (chr and ord's pull requests exist) and new functions (str_at and str_codepoint_at and str_each_char and str_each_codepoint) and rethink the value of this feature request.
 [2017-01-08 17:23 UTC] krakjoe@php.net
-Status: Open +Status: Wont fix
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Thu Jan 02 19:01:28 2025 UTC