php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #81349 mb_detect_encoding misdetcts ASCII in some cases
Submitted: 2021-08-11 09:24 UTC Modified: 2021-08-11 09:31 UTC
From: phofstetter at sensational dot ch Assigned: nikic (profile)
Status: Closed Package: mbstring related
PHP Version: 8.1.0beta2 OS: macOS
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: phofstetter at sensational dot ch
New email:
PHP Version: OS:

 

 [2021-08-11 09:24 UTC] phofstetter at sensational dot ch
Description:
------------
mb_detect_encoding() seems to have changed quite a bit between PHP 8.0 and 8.1 and mostly for the better, TBH.

However, here's a test case where it misdetects a string as being ASCII when it absolutely cannot be because byte 0 is outside of the 7bit range of ASCII. 

If byte 1 is any non-letter character aside of space, the misdetection will happen. If it's any letter or space, it's fine.

While in <= PHP 8.0, the function had it's quirks, it would never detect a string with a character with its high-bit set as ASCII.

Test script:
---------------
<?php

echo(mb_detect_encoding("\xe4,a", ['ASCII', 'UTF-8', 'ISO-8859-1'])."\n");
echo(mb_detect_encoding("\xe4 a", ['ASCII', 'UTF-8', 'ISO-8859-1'])."\n");


Expected result:
----------------
ISO-8859-1
ISO-8859-1

Actual result:
--------------
ASCII
ISO-8859-1

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2021-08-11 09:31 UTC] nikic@php.net
-Assigned To: +Assigned To: nikic
 [2021-08-11 09:37 UTC] git@php.net
Automatic comment on behalf of nikic
Revision: https://github.com/php/php-src/commit/28500fe4ef1218e04e830ae94d889c4b6c67940d
Log: Fixed bug #81349
 [2021-08-11 09:37 UTC] git@php.net
-Status: Assigned +Status: Closed
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Fri Jan 31 11:01:30 2025 UTC