|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #81349 mb_detect_encoding misdetcts ASCII in some cases
Submitted: 2021-08-11 09:24 UTC Modified: 2021-08-11 09:31 UTC
From: phofstetter at sensational dot ch Assigned: nikic (profile)
Status: Closed Package: mbstring related
PHP Version: 8.1.0beta2 OS: macOS
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
Block user comment
Status: Assign to:
Bug Type:
From: phofstetter at sensational dot ch
New email:
PHP Version: OS:


 [2021-08-11 09:24 UTC] phofstetter at sensational dot ch
mb_detect_encoding() seems to have changed quite a bit between PHP 8.0 and 8.1 and mostly for the better, TBH.

However, here's a test case where it misdetects a string as being ASCII when it absolutely cannot be because byte 0 is outside of the 7bit range of ASCII. 

If byte 1 is any non-letter character aside of space, the misdetection will happen. If it's any letter or space, it's fine.

While in <= PHP 8.0, the function had it's quirks, it would never detect a string with a character with its high-bit set as ASCII.

Test script:

echo(mb_detect_encoding("\xe4,a", ['ASCII', 'UTF-8', 'ISO-8859-1'])."\n");
echo(mb_detect_encoding("\xe4 a", ['ASCII', 'UTF-8', 'ISO-8859-1'])."\n");

Expected result:

Actual result:


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2021-08-11 09:31 UTC]
-Assigned To: +Assigned To: nikic
 [2021-08-11 09:37 UTC]
Automatic comment on behalf of nikic
Log: Fixed bug #81349
 [2021-08-11 09:37 UTC]
-Status: Assigned +Status: Closed
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Apr 16 12:01:29 2024 UTC