php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #77907 mb-functions do not respect default_encoding
Submitted: 2019-04-16 14:06 UTC Modified: 2019-04-17 12:12 UTC
From: n dot scheer at binserv dot de Assigned: nikic (profile)
Status: Closed Package: *Unicode Issues
PHP Version: 7.2.17 OS: CentOS 7.6.1810
Private report: No CVE-ID: None
View Add Comment Developer Edit
Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know!
Just going to say 'Me too!'? Don't clutter the database with that please !
Your email address:
MUST BE VALID
Solve the problem:
33 + 21 = ?
Subscribe to this entry?

 
 [2019-04-16 14:06 UTC] n dot scheer at binserv dot de
Description:
------------
The documentation states (c.f. https://www.php.net/manual/en/ini.core.php#ini.default-charset):

"In PHP 5.6 onwards, "UTF-8" is the default value and [...] The value of default_charset
will also be used to set the default character set for [...] and for mbstring functions
if the mbstring.http_input mbstring.http_output mbstring.internal_encoding
configuration option is unset."

As such, I'd expect to be able to set default_charset to iso-8859-1 and mbstring to pick that same setting for its internal encoding (if the mentioned directives are unset, that is).

In the test script below the output of mb_strlen should be "2" in both cases, as iso-8859-1 is a single byte encoding. This is not the case, instead it seems that the two bytes are recognized as the utf-8 character "รถ".

If mb_internal_encoding is set explicitly, the mb-functions work as expected.

But it should in fact not be needed to use mb_internal_encoding() nor the ini setting for it, because default_charset should be used as default.





Test script:
---------------
<?php

ini_set('default_charset', 'iso-8859-1');
var_dump(ini_get("mbstring.internal_encoding"));
var_dump(ini_get("mbstring.http_input"));
var_dump(ini_get("mbstring.http_output"));
var_dump(mb_internal_encoding());
var_dump(mb_strlen( "\xc3\xb6" ));
var_dump(mb_strlen( "\xc3\xb6", '8bit' ));

mb_internal_encoding('iso-8859-1');
var_dump(mb_internal_encoding());
var_dump(mb_strlen( "\xc3\xb6" ));
var_dump(mb_strlen( "\xc3\xb6", '8bit' ));

Expected result:
----------------
string(0) ""
string(0) ""
string(0) ""
string(5) "UTF-8"
int(1)
int(2)
string(10) "ISO-8859-1"
int(2)
int(2)

Actual result:
--------------
string(0) ""
string(0) ""
string(0) ""
string(5) "UTF-8"
int(2)
int(2)
string(10) "ISO-8859-1"
int(2)
int(2)

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2019-04-16 14:13 UTC] nikic@php.net
-Assigned To: +Assigned To: nikic
 [2019-04-16 14:13 UTC] nikic@php.net
Working on this right now...
 [2019-04-16 14:40 UTC] nikic@php.net
https://github.com/php/php-src/pull/4035

Probably 7.4 only, this is a non-trivial change.
 [2019-04-17 12:08 UTC] nikic@php.net
-Status: Assigned +Status: Closed
 [2019-04-17 12:12 UTC] n dot scheer at binserv dot de
That was fast - thank you! :)
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Mar 28 17:01:29 2024 UTC