|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
[2021-08-11 09:31 UTC] nikic@php.net
-Assigned To:
+Assigned To: nikic
[2021-08-11 09:37 UTC] git@php.net
[2021-08-11 09:37 UTC] git@php.net
-Status: Assigned
+Status: Closed
|
|||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Mon Oct 27 03:00:02 2025 UTC |
Description: ------------ mb_detect_encoding() seems to have changed quite a bit between PHP 8.0 and 8.1 and mostly for the better, TBH. However, here's a test case where it misdetects a string as being ASCII when it absolutely cannot be because byte 0 is outside of the 7bit range of ASCII. If byte 1 is any non-letter character aside of space, the misdetection will happen. If it's any letter or space, it's fine. While in <= PHP 8.0, the function had it's quirks, it would never detect a string with a character with its high-bit set as ASCII. Test script: --------------- <?php echo(mb_detect_encoding("\xe4,a", ['ASCII', 'UTF-8', 'ISO-8859-1'])."\n"); echo(mb_detect_encoding("\xe4 a", ['ASCII', 'UTF-8', 'ISO-8859-1'])."\n"); Expected result: ---------------- ISO-8859-1 ISO-8859-1 Actual result: -------------- ASCII ISO-8859-1