php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #73088 str_check_encoding and str_scrub
Submitted: 2016-09-15 10:26 UTC Modified: 2017-01-08 17:23 UTC
From: masakielastic at gmail dot com Assigned:
Status: Wont fix Package: mbstring related
PHP Version: Next Minor Version OS:
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: masakielastic at gmail dot com
New email:
PHP Version: OS:

 

 [2016-09-15 10:26 UTC] masakielastic at gmail dot com
Description:
------------
htmlspecialchars have an ability to check character encoding and replace ill-formed byte sequences (ENT_SUBSTITUTE) since PHP 5.4. The ability is useful for not only escaping special chars but general. Mbstring have same feture such as mb_check_encoding and mb_scrub (PHP 7.2 dev).

bool str_check_encoding ([ string $str[,string $encoding = ini_get("default_charset") ]] )
bool str_scrub ([ string $str[,string $encoding = ini_get("default_charset") ]] )


Patches

Add a Patch

Pull Requests

Pull requests:

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2016-09-15 10:28 UTC] cmb@php.net
-Summary: mb_at and mb_codepoint_at +Summary: mb_check_encoding and mb_scrub
 [2016-09-15 10:33 UTC] masakielastic at gmail dot com
-Summary: mb_check_encoding and mb_scrub +Summary: str_check_encoding and str_scrub
 [2016-09-15 10:33 UTC] masakielastic at gmail dot com
fixed title.
 [2016-09-15 10:43 UTC] nikic@php.net
As these functions deal with multibyte strings, they naturally belong into the mbstring extension, which, as you mention, already has analogous functions. Why would these need to be mirrored into ext/standard?
 [2016-09-19 21:25 UTC] masakielastic at gmail dot com
@nikic

The one of target is Symfony's polyfill-mbstring and Drupal adopt them.

https://github.com/symfony/polyfill
https://api.drupal.org/api/drupal/vendor!symfony!polyfill-mbstring!Mbstring.php/class/Mbstring/8.2.x

Algthouh mb_strlen and mb_substr can properly handle ill-formed byte sequences, PCRE's functions iconv_strlen/iconv_substr can't handle them. These functions are used for emulating mbstring's functions. grapheme_strlen and grapheme_substr also cant't handle them.

$str = "ab\x80cde";

var_dump(
    6 === mb_strlen($str, 'UTF-8'),
    false === preg_match_all('/./su', $str),
    false === iconv_strlen($str, 'UTF-8'),
    NULL === grapheme_strlen($str)
);

iconv cant't handle ill-formed byte properly even in 2016.

https://sourceware.org/bugzilla/show_bug.cgi?id=2373

I will open other requests for adding encoding option to substr (chr and ord's pull requests exist) and new functions (str_at and str_codepoint_at and str_each_char and str_each_codepoint) and rethink the value of this feature request.
 [2017-01-08 17:23 UTC] krakjoe@php.net
-Status: Open +Status: Wont fix
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Wed Apr 24 06:01:29 2024 UTC