php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #81366 mb_detect_encoding behaving different and failing to detect UTF-8
Submitted: 2021-08-16 19:14 UTC Modified: 2021-08-16 19:56 UTC
From: martin at herndl dot org Assigned:
Status: Duplicate Package: mbstring related
PHP Version: 8.1.0beta2 OS: macOS
Private report: No CVE-ID: None
View Add Comment Developer Edit
Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know!
Just going to say 'Me too!'? Don't clutter the database with that please !
Your email address:
MUST BE VALID
Solve the problem:
33 + 48 = ?
Subscribe to this entry?

 
 [2021-08-16 19:14 UTC] martin at herndl dot org
Description:
------------
mb_detect_encoding with ommited encodings (= null as argument => fallback to default detect order) behaves different on 8.1.0beta2 and fails to detect UTF-8 in some cases with out-of-the-box settings.

I noticed this via the `ApplicationTest::testRenderExceptionWithDoubleWidthCharacters` test from Symfony in https://github.com/symfony/symfony/issues/41552 that broke because it converted UTF-8 to UTF-8 again.

Test script:
---------------
<?php

var_dump(mb_detect_order());
var_dump(mb_detect_encoding('コマンドの実行中にエラーが発生しました。', null, true));
var_dump(mb_detect_encoding('コマンドの実行中にエラーが発生しました', null, true));

Expected result:
----------------
array(2) {
  [0]=>
  string(5) "ASCII"
  [1]=>
  string(5) "UTF-8"
}
string(5) "UTF-8"
string(5) "UTF-8"

Actual result:
--------------
array(2) {
  [0]=>
  string(5) "ASCII"
  [1]=>
  string(5) "UTF-8"
}
string(5) "ASCII"
string(5) "UTF-8"

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2021-08-16 19:49 UTC] martin at herndl dot org
A simpler example that I just found:

Test script:
---------------
<?php
var_dump(mb_detect_encoding(' § ', null, true));

Expected result:
----------------
string(5) "UTF-8"

Actual result:
--------------
string(5) "ASCII"
 [2021-08-16 19:56 UTC] nikic@php.net
-Status: Open +Status: Duplicate
 [2021-08-16 19:56 UTC] nikic@php.net
Duplicate of bug #81349 (fixed in next beta).
 [2021-08-16 20:02 UTC] martin at herndl dot org
Ah I wonder how I missed that, sorry and thank you!
 
PHP Copyright © 2001-2021 The PHP Group
All rights reserved.
Last updated: Sat Nov 27 16:03:14 2021 UTC