php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #72992 mbstring.internal_encoding doesn't inherit default_charset
Submitted: 2016-09-01 12:00 UTC Modified: 2019-05-07 14:12 UTC
Votes:4
Avg. Score:4.5 ± 0.5
Reproduced:3 of 3 (100.0%)
Same Version:1 (33.3%)
Same OS:1 (33.3%)
From: zoeslam at gmail dot com Assigned: nikic (profile)
Status: Closed Package: mbstring related
PHP Version: 7.0.10 OS: Ubuntu 16.04
Private report: No CVE-ID: None
 [2016-09-01 12:00 UTC] zoeslam at gmail dot com
Description:
------------
Likely related to https://bugs.php.net/bug.php?id=70035 but internal_encoding is now deprecated

php -d default_charset='ISO-8859-1' -i | grep -E '(charset|encoding)'

default_charset => ISO-8859-1 => ISO-8859-1
input_encoding => no value => no value
internal_encoding => no value => no value
output_encoding => no value => no value
zend.script_encoding => no value => no value
iconv.input_encoding => no value => no value
iconv.internal_encoding => no value => no value
iconv.output_encoding => no value => no value
HTTP input encoding translation => disabled
mbstring.encoding_translation => Off => Off
mbstring.internal_encoding => no value => no value

Test script:
---------------
php -d default_charset='ISO-8859-1' -r 'var_dump(mb_internal_encoding());'
php -r 'ini_set("default_charset", "ISO-8859-1"); var_dump(mb_internal_encoding());'

Expected result:
----------------
string(10) "ISO-8859-1"
string(10) "ISO-8859-1"

Actual result:
--------------
string(5) "UTF-8"
string(5) "UTF-8"

Patches

bug72992.patch (last revision 2016-09-08 04:12 UTC by yohgaki@php.net)

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2016-09-06 08:55 UTC] yohgaki@php.net
-Assigned To: +Assigned To: yohgaki
 [2016-09-08 04:12 UTC] yohgaki@php.net
The following patch has been added/updated:

Patch Name: bug72992.patch
Revision:   1473307940
URL:        https://bugs.php.net/patch-display.php?bug=72992&patch=bug72992.patch&revision=1473307940
 [2016-09-08 04:14 UTC] yohgaki@php.net
INI could be propagated, but it's not necessary. If higher precedence INI is set, lower precedence INI is not set, higher is used.

I'll commit fix for logical error part only.
 [2016-09-08 04:57 UTC] yohgaki@php.net
Automatic comment on behalf of yohgaki
Revision: http://git.php.net/?p=php-src.git;a=commit;h=8bbd0952e5bba88426bac1596dcc3bfa504dbe4e
Log: Fix Bug #72992 mbstring.internal_encoding doesn't inherit default_charset
 [2016-09-08 04:57 UTC] yohgaki@php.net
-Status: Assigned +Status: Closed
 [2016-09-09 11:37 UTC] zoeslam at gmail dot com
Hi, I recompiled from source with your commit, but nothing changed.

Could you explain me better your comment about INI settings?

If I read the manual at http://php.net/manual/en/mbstring.configuration.php#ini.mbstring.internal-encoding I assume that, when every setting is left empty except "default_charset", MB will use "default_charset" value for the internal encoding, isn't so?

If it is so, the bug is still present
 [2016-09-10 01:57 UTC] yohgaki@php.net
So you would like to have INI propagation as attached patch?
 [2016-09-10 02:01 UTC] yohgaki@php.net
-Status: Closed +Status: Re-Opened
 [2016-09-10 02:01 UTC] yohgaki@php.net
I'll just commit INI propagation part of attached patch, too. Then you'll see modified INIs.
 [2016-09-12 01:10 UTC] yohgaki@php.net
-Status: Re-Opened +Status: Closed
 [2016-09-12 01:10 UTC] yohgaki@php.net
It seems current INI system cannot handler INI propagation at startup well. I wouldn't like to add ugly hacks. So I leave as it is now. 

Mbstring's INI value is not significant when higher precedence INI is set.
 [2016-09-12 01:10 UTC] yohgaki@php.net
It seems current INI system cannot handler INI propagation at startup well. I wouldn't like to add ugly hacks. So I leave as it is now. 

Mbstring's INI value is not significant when higher precedence INI is set.
 [2016-09-12 01:53 UTC] yohgaki@php.net
Mbstring's INI value is not significant when higher precedence INI is set when mbstring INI is not set.
 [2016-09-12 06:41 UTC] zoeslam at gmail dot com
I am sorry I still don't get what you say, but this is not important, I rely on your analysis.

If:
1) "mbstring.internal_encoding" ini setting is deprecated
2) "default_charset" doesn't propagate to mb_internal_encoding()
3) and there is no fix for that

At least the documentation should be updated to report that the only way to change mb_internal_encoding() is to call mb_internal_encoding() itself.

May we update the manual please?
 [2016-09-12 07:38 UTC] yohgaki@php.net
I noticed mb_internal_encoding() should be modified to reflect encoding actually used... Thank you. 

Anyway, current implementation does not change lower precedence INI values. Used encoding is determined like (this is iconv because iconv code is simpler)

static char *get_internal_encoding(void) {
	if (ICONVG(internal_encoding) && ICONVG(internal_encoding)[0]) {
		return ICONVG(internal_encoding);
	} else if (PG(internal_encoding) && PG(internal_encoding)[0]) {
		return PG(internal_encoding);
	} else if (SG(default_charset)) {
		return SG(default_charset);
	}
	return "";
}

As you can see, INI setting precedence is kept as defined in the RFC.

Mbstring's encoding handling is not simple as iconv, if I'm missing something please let me know.
 [2016-10-17 10:08 UTC] bwoebi@php.net
Automatic comment on behalf of yohgaki
Revision: http://git.php.net/?p=php-src.git;a=commit;h=8bbd0952e5bba88426bac1596dcc3bfa504dbe4e
Log: Fix Bug #72992 mbstring.internal_encoding doesn't inherit default_charset
 [2017-09-08 17:18 UTC] gerthaubrich at web dot de
Hi, seems NOT to be fixed for 5.6.31, 7.0.22 and 7.1.8 from my point of view (self-compiled on Debian 8 x64).


I haven't found this bug in the first place and tried to submit a new one, but luckly there is a fail safe. Here are my details:


Hello, I am using Debian 8 x64 Apache/FPM and have compiled 5.6.31, 7.0.22 and 7.1.8 successfully (using php.net TAR-packages). Reviewing multi-byte functions and config options, I noticed that the value of mbstring.internal_encoding does not change in either version, if default_charset is set via ini_set() in the script or even if it is changed directly in php.ini. Also setting internal_encoding in php.ini does not change the mbstring value.

The official documentation is quiet limited and the notes in the example php.ini do not seem to match the observed behaviour:

; Use of this INI entry is DEPRECATED, use global internal_encoding instead.
; internal/script encoding.
; Some encoding cannot work as internal encoding, e.g. SJIS, BIG5, ISO-2022-*
; If empty, default_charset or internal_encoding or iconv.internal_encoding is
; used. The precedence is:
;    default_charset < internal_encoding < iconv.internal_encoding
;;mbstring.internal_encoding =

First, "iconv.internal_encoding" seems to be a copy&paste error, I assume. I guess it should be "mbstring.internal_encoding", right?

Second, the value I can read with mb_internal_encoding() or the behaviour I see, when I for example use implicit conversion via zend.multibyte=On or an explicit conversion via mb_convert_encoding() without using a source encoding as a third parameter suggests, that the value does not change unless it is directly set in php.ini or via ini_set().
Setting default_charset or internal_encoding in php.ini does NOT implicitly change mbstring's internal encoding value in the PHP versions, mentioned above (for example tested with ISO-8859-1 and ISO-8859-15).

Have I missed something in the docs or is this an unintended behaviour? It is hard to believe, that this might be some system-specific issue for my self-compiled versions, but it is also strange that all three major versions behave in the same way. (Maybe I have missed something.)

In contrast, iconv behaves like the comments in the php.ini suggest. With default_charset = A, all three iconv encodings implicitly change to A, if I additionally set internal_encoding = B, also iconv.internal_encoding changes to B, whereas the remaining two still use A.

As far as I can see it right now, mbstring's internal encoding can currently only be changed directly.
It currently uses UTF-8 for all my PHP versions. What is the source for this default value? Is it the implicit DEFAULT value of default_charset or is it the system locale, which also uses UTF-8 on my system? Nevertheless, the options usage seems to be inconsistent.

--------

I would expect, that changing/setting default_charset and/or internal_encoding in php.ini would also affect mbstring's internal default encoding (as the comments suggest and as it does for iconv). Additionally it's a shame that the ini-comments provide more "insights" then the official documentation.


Note 1: It would be nice, if phpinfo() would display inherited values instead of "no value" in the table (for e.g. internal_encoding, mbstring.internal_encoding, etc.). Seeing effective values would be helpful. For example "<UTF-8>" to mark, that the value has not been explicitly set in php.ini, but instead has been derived from another option in the inheritance chain. (The HTML-output also could use CSS for special formatting.)


Note 2: Option default_mimetype does take its additonal charset information ("Content-Type: text/html; charset=utf-8") only from option default_charset. With option output_encoding being available, it would be more straightforward to use this one as the primary source, which implicitly defaults to default_charset (if it is not explicitly set).


Best regards,
Gert
 [2017-09-08 19:36 UTC] yohgaki@php.net
-Status: Closed +Status: Re-Opened
 [2017-09-27 09:53 UTC] gerthaubrich at web dot de
Tested the behavior for self-compiled PHP 7.0.23, 7.1.9 and 7.2.0RC2 and it's the same problem (on Debian 8 x64). Default or set values on default_charset and internal_encoding are not handed down to mbstring.internal_encoding. It remains on its default value 'UTF-8' unless it is changed directly. On the other hand, changes are correctly reflected by iconv encodings.

Example:

	<?php
	header('Content-Type: text/plain');
	header('Cache-Control: no-cache, no-store, must-revalidate');
	error_reporting(E_ALL);

	echo "mbstring internal encoding inheritance test: PHP " . PHP_VERSION . date(' , Y-m-d H:i:s') . "\n";

	echo "\nBEFORE\n";
	echo "default_charset       : '" . ini_get('default_charset') . "'\n";
	echo "input_encoding        : '" . ini_get('input_encoding') . "'\n";
	echo "internal_encoding     : '" . ini_get('internal_encoding') . "'\n";
	echo "output_encoding       : '" . ini_get('output_encoding') . "'\n";
	echo "mb_internal_encoding(): '" . mb_internal_encoding() . "'\n";
	echo "iconv_get_encoding()  :\n" . var_export(iconv_get_encoding('all'), TRUE) . "\n";

	sleep(1);
	ini_set('default_charset', 'Windows-1252');
	ini_set('output_encoding', 'ISO-8859-15');

	echo "\nAFTER\n";
	echo "default_charset       : '" . ini_get('default_charset') . "'\n";
	echo "input_encoding        : '" . ini_get('input_encoding') . "'\n";
	echo "internal_encoding     : '" . ini_get('internal_encoding') . "'\n";
	echo "output_encoding       : '" . ini_get('output_encoding') . "'\n";
	echo "mb_internal_encoding(): '" . mb_internal_encoding() . "'\n";
	echo "iconv_get_encoding()  :\n" . var_export(iconv_get_encoding('all'), TRUE) . "\n";
	?>

All version have been compiled with option --enable-mbstring, whereas iconv is enabled by default. For this test, default_charset = "GB18030" has been set in php.ini and mbstring.internal_encoding and all three iconv encoding options remains empty/unset in php.ini:

	mbstring internal encoding inheritance test: PHP 7.2.0RC2 , 2017-09-27 08:47:18

	BEFORE
	default_charset       : 'GB18030'
	input_encoding        : ''
	internal_encoding     : ''
	output_encoding       : ''
	mb_internal_encoding(): 'UTF-8'
	iconv_get_encoding()  :
	array (
	  'input_encoding' => 'GB18030',
	  'output_encoding' => 'GB18030',
	  'internal_encoding' => 'GB18030',
	)

	AFTER
	default_charset       : 'Windows-1252'
	input_encoding        : ''
	internal_encoding     : ''
	output_encoding       : 'ISO-8859-15'
	mb_internal_encoding(): 'UTF-8'
	iconv_get_encoding()  :
	array (
	  'input_encoding' => 'Windows-1252',
	  'output_encoding' => 'ISO-8859-15',
	  'internal_encoding' => 'Windows-1252',
	)

It is the same result for all three PHP versions. Additionally setting internal_encoding directly in php.ini does not implicitly change mbstring.internal_encoding.
 [2017-09-27 10:40 UTC] gerthaubrich at web dot de
POSSIBLE ISSUE with FPM ? (permanent encoding changes)

(Re)starting FPM instace (compiled w/o OPcache and no other instance or version running) and only setting default_charset = "GB18030" in php.ini returns the output above with initially ("BEFORE") 'iconv.output_encoding' => 'GB18030' (as expected).
If I reload the page, even multiple times, I now get ("BEFORE") 'iconv.output_encoding' => 'ISO-8859-15' for every following request, until I perform a systemctl reload or systemctl restart for this FPM instance.

--> It seems as if the FPM instance does permanently store indirect in-script changes to iconv encodings (and not only for the duration of the script).

The behavior is replicable. If php.ini uses 'Windows-1251' instead, the first requests returns this for 'iconv.output_encoding', but following requests always return 'ISO-8859-15' in "BEFORE" section.
If I extend the test script and also set 'internal_encoding' and 'input_encoding' explicitly, also these changes are handed over to iconv and are permanently stored for this FPM instance.
The problem is also independed from the browser. e.g. the first request shows the initial values in Chrome, the second request, initiated via Firefox, now shows altered initial values for all three iconv encodings.

I noted this behavior in PHP 7.0.23, 7.1.9 and also 7.2.0RC2.
Additionally self-compiled PHP 5.6.31 behaves more strangely. It also seems to permanently store inherited iconv changes in its FPM instance, but the initially "BEFORE" iconv values seem to change randomly with every following request and additionally display corrupted (non-ascii) strings in the output. (The "AFTER" values are displayed correctly.)

--> On the other hand, the usage of deprecated method iconv_set_encoding() does not result in a permanent storage within the FPM instance; changes are only active for the duration of the script (as it should be). It seems the code, propagating values from default_charset and the *_encoding ini-options to iconv replaces the initialy read, global php.ini settings in the FPM instance instead of only modifying the script's runtime values.

--> For PHP 5.6.31 it seems, that the already invalid permanent storage additionally uses a malformed C-code to copy/replace the encoding name strings. It seems as if the code does not properly recognize the beginning of the strings and starts copying in the middel of the strings or even copies from some random/invalid pointer?!

--> Also note, that permanent storage in FPM instances also might affect mbsting.internal_encoding value inheritance, IF this functionality WOULD work as it should (if the code is the same as for iconv and has been copied and only adjusted).

Finally I want to mention, that I cannot ensure, that this observed behavior is not connected to my specific system. All PHP versions are compiled mostly statically, resulting in an approx. 40 MB binary. PHP 5.6.31 and 7.0.23 are compiled against updated distro-packages only, PHP 7.1.9 and 7.2.0RC2 use additionally custom-build libxml 2.9.4, libxslt 1.1.29 and tidy 5.4.0 and 7.2.0RC2 additionally uses libargon2 20161029, libsodium 1.0.13 and libzip 1.3.0, but I cannot imagine how any of these libs could affect this behavior.

Best regards,
Gert
 [2017-09-27 13:07 UTC] gerthaubrich at web dot de
Compared to Debian 8.9 (clean) distro setup with Apache 2.4.10 (prefork) with mod_php5 5.6.30-0+deb8u1:
mbstring.internal_encoding also does NOT inherit values from default_charset or internal_encoding, but here the PHP Apache-module does not permanently store in-script encoding changes in iconv. The previously noted problems seem to be specific for the FPM-module. / BR
 [2017-10-24 03:07 UTC] kalle@php.net
-Status: Re-Opened +Status: Assigned
 [2019-05-07 14:12 UTC] nikic@php.net
-Status: Assigned +Status: Closed -Assigned To: yohgaki +Assigned To: nikic
 [2019-05-07 14:12 UTC] nikic@php.net
This issue should (finally!) be fixed in PHP 7.4, see also bug #77907.

I should mention a caveat here: The ini settings itself will not be inherited, i.e. setting default_charset is not going to magically modify mbstring.internal_encoding or similar. But the actually used encoding will properly fall back though internal_encoding and default_charset.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Mar 19 08:01:29 2024 UTC