|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #65081 new function for replacing ill-formd byte sequences with substitute characters
Submitted: 2013-06-21 03:20 UTC Modified: 2016-10-17 06:33 UTC
Avg. Score:2.0 ± 0.0
Reproduced:0 of 0 (0.0%)
From: masakielastic at gmail dot com Assigned: yohgaki (profile)
Status: Closed Package: mbstring related
PHP Version: 5.5.0 OS: All
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
Block user comment
Status: Assign to:
Bug Type:
From: masakielastic at gmail dot com
New email:
PHP Version: OS:


 [2013-06-21 03:20 UTC] masakielastic at gmail dot com
New function for replacing ill-formd byte sequences with substitute characters 
is needed. The problem using mb_convert_encoding for that purpose is that the 
function name doesn't represent the intent.Specfying same encoding twice is 
verbose and can be interpreted as meaningless conversion for the beginners. 

$str = mb_convert_encoding($str, 'UTF-8', 'UTF-8');

The case study can be seen in Ruby. Ruby 2.1 introduces String#scrub.

The debate whether the substitute character can be specified or not is needed.

function mb_scrub($str, $encoding = '', $substitute = '')
    if ('' === $encoding) {

        $encoding = mb_internal_encoding();


    if ('' === $substutute) {

        $ret = mb_convert_encoding($str, $encoding, $encoding);
    } else {

        $before_substitute = mb_substitute_character();
        $ret = mb_convert_encoding($str, $encoding, $encoding);


    return $ret;

This discussion can be applied to Uconverter.

function uconverter_scrub($str, $encoding, $opts = '')
    if ('' === $opts) {
        return UConverter::transcode($str, $encoding, $encoding, $opts);
    } else {
        return UConverter::transcode($str, $encoding, $encoding);

The discussion for standard string functions and filter functions may be needed 
since htmlspecialchars can be used for that purpose.

function str_scrub($str, $encoding = 'UTF-8')
    return htmlspecialchars_decode(htmlspecialchars($str, ENT_SUBSTITUTE, 


Add a Patch

Pull Requests

Pull requests:

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2013-06-22 14:02 UTC]
related to bug #65045 .
 [2013-08-01 08:56 UTC]
-Assigned To: +Assigned To: yohgaki
 [2013-08-01 08:56 UTC]
Assigned to me, so that this report not be forgotten.
 [2016-10-17 06:33 UTC]
-Status: Assigned +Status: Closed
 [2016-10-17 06:33 UTC]
PR is submitted by reporter and merged.
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Jun 21 22:01:29 2024 UTC