|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #81676 Converting string to UTF using \mb_convert_encoding returns empty string
Submitted: 2021-11-30 12:10 UTC Modified: 2021-12-05 17:35 UTC
From: anton at gamee dot com Assigned: alexdowad (profile)
Status: Duplicate Package: mbstring related
PHP Version: 8.1.0 OS: MacOs
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If this is not your bug, you can add a comment by following this link.
If this is your bug, but you forgot your password, you can retrieve your password here.
Bug Type:
From: anton at gamee dot com
New email:
PHP Version: OS:


 [2021-11-30 12:10 UTC] anton at gamee dot com
When using function \mb_convert_encoding() with third parameter \mb_list_encodings() function returns empty string.

Test script:

$string = 'test';
$string = \mb_convert_encoding($string, 'UTF-8', \mb_list_encodings());
echo $string; // '' expecting 'test'

Expected result:

Actual result:


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2021-11-30 12:28 UTC]
-Status: Open +Status: Duplicate -Package: Unknown/Other Function +Package: mbstring related -Assigned To: +Assigned To: cmb
 [2021-11-30 12:28 UTC]
Well, encoding detection is generally hard, if not impossible, and
there have been changes lately.  Basically, your problem boils
down to <>.

Anyhow, this ticket is a duplicate of bug #81390.
 [2021-12-05 16:09 UTC]
-Assigned To: cmb +Assigned To: alexdowad
 [2021-12-05 16:09 UTC]
Anton, thanks so much for discovering this problem.

PHP is mistakenly identifying the string 'test' as UCS-4 text. In UCS-4, it would be a single (bogus) codepoint U+74657374.

Part of the reason for this is because the new implementation of mb_detect_encoding (which is also used internally by mb_convert_encoding when encoding detection is needed) relies on the text encoding conversion code to signal errors if the input text is not valid in a certain encoding.

By design, the conversion code for UCS-4 doesn't treat codepoints over U+10FFFF as erroneous. That is the difference between UTF-32 and UCS-4; UTF-32 requires that all input characters should be within the legal range for Unicode codepoints, but UCS-4 allows anything.

I see now that this characteristic of UCS-4 interacts badly with the new encoding detection code. Will have to work on that.
 [2021-12-05 17:22 UTC]
OK, my analysis was a bit wrong. I checked again and found that the string is being detected as 'UUENCODE', not UCS-4. Duh.

This is the same issue which I addressed for mb_detect_encoding in a2bc57e0e5.
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Wed Jul 24 13:01:29 2024 UTC