php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #77766 mb_detect_encoding() returns false for strings with any chr(128..193|245..255)
Submitted: 2019-03-18 14:55 UTC Modified: 2019-03-18 15:47 UTC
From: ca at lsp dot net Assigned:
Status: Not a bug Package: mbstring related
PHP Version: 7.2.16 OS: Linux + Windows
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: ca at lsp dot net
New email:
PHP Version: OS:

 

 [2019-03-18 14:55 UTC] ca at lsp dot net
Description:
------------
`mb_detect_encoding()` fails if `$str` contains an ASCII character between 128 and 193 or between 245 and 255.

I noticed this after getting exceptions when importing emails (into MySQL 8 using PDO with utf8mb4). The email had raw non-breaking spaces (i.e. ` ` in decoded form) and, apparently, PDO couldn't properly encode them:

$sql = "INSERT INTO `emails` (..., `body`, ...) VALUES (..., '...\xA0...', ...)";
$pdo->query($sql);

SQLSTATE[HY000]: General error: 1366 Incorrect string value: '...\xA0...' for column ...

Workaround: Encode the value or query conditionally (if `mb_detect_encoding() === false`) using `utf8_encode`.

P.S.: Quoting the value using `PDO::quote` and/or using a prepared statement yielded the same result.

---
From manual page: https://php.net/function.mb-detect-encoding
---


Test script:
---------------
<?php
var_dump(
    mb_detect_encoding("INSERT INTO `emails` (`body`) VALUES ('Hello\xA0world!')")
);

for ($i = 0; $i <= 255; $i++) {
    $chr = chr($i);
    if (false === mb_detect_encoding($chr)) {
      printf("%d ", $i);
    }
}

Expected result:
----------------
string(5) "UTF-8"

Actual result:
--------------
bool(false)
128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 245 246 247 248 249 250 251 252 253 254 255

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2019-03-18 15:47 UTC] requinix@php.net
-Status: Open +Status: Not a bug
 [2019-03-18 15:47 UTC] requinix@php.net
mb_detect_encoding() needs to know the encodings you want to support - given that very many encodings overlap with each other it does not try to test every one known. The default comes from mb_detect_order() which is probably just "ASCII": the bytes up to 127.

Adding a
  mb_detect_order("ISO-8859-1");
(or using that encoding with mb_detect_encoding) will make the script work.
 
PHP Copyright © 2001-2022 The PHP Group
All rights reserved.
Last updated: Sat Jan 29 09:03:34 2022 UTC