|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
[2016-09-01 12:00 UTC] zoeslam at gmail dot com
Description: ------------ Likely related to https://bugs.php.net/bug.php?id=70035 but internal_encoding is now deprecated php -d default_charset='ISO-8859-1' -i | grep -E '(charset|encoding)' default_charset => ISO-8859-1 => ISO-8859-1 input_encoding => no value => no value internal_encoding => no value => no value output_encoding => no value => no value zend.script_encoding => no value => no value iconv.input_encoding => no value => no value iconv.internal_encoding => no value => no value iconv.output_encoding => no value => no value HTTP input encoding translation => disabled mbstring.encoding_translation => Off => Off mbstring.internal_encoding => no value => no value Test script: --------------- php -d default_charset='ISO-8859-1' -r 'var_dump(mb_internal_encoding());' php -r 'ini_set("default_charset", "ISO-8859-1"); var_dump(mb_internal_encoding());' Expected result: ---------------- string(10) "ISO-8859-1" string(10) "ISO-8859-1" Actual result: -------------- string(5) "UTF-8" string(5) "UTF-8" Patchesbug72992.patch (last revision 2016-09-08 04:12 UTC by yohgaki@php.net)Pull RequestsHistoryAllCommentsChangesGit/SVN commits
|
|||||||||||||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Fri Oct 24 02:00:01 2025 UTC |
I noticed mb_internal_encoding() should be modified to reflect encoding actually used... Thank you. Anyway, current implementation does not change lower precedence INI values. Used encoding is determined like (this is iconv because iconv code is simpler) static char *get_internal_encoding(void) { if (ICONVG(internal_encoding) && ICONVG(internal_encoding)[0]) { return ICONVG(internal_encoding); } else if (PG(internal_encoding) && PG(internal_encoding)[0]) { return PG(internal_encoding); } else if (SG(default_charset)) { return SG(default_charset); } return ""; } As you can see, INI setting precedence is kept as defined in the RFC. Mbstring's encoding handling is not simple as iconv, if I'm missing something please let me know.Hi, seems NOT to be fixed for 5.6.31, 7.0.22 and 7.1.8 from my point of view (self-compiled on Debian 8 x64). I haven't found this bug in the first place and tried to submit a new one, but luckly there is a fail safe. Here are my details: Hello, I am using Debian 8 x64 Apache/FPM and have compiled 5.6.31, 7.0.22 and 7.1.8 successfully (using php.net TAR-packages). Reviewing multi-byte functions and config options, I noticed that the value of mbstring.internal_encoding does not change in either version, if default_charset is set via ini_set() in the script or even if it is changed directly in php.ini. Also setting internal_encoding in php.ini does not change the mbstring value. The official documentation is quiet limited and the notes in the example php.ini do not seem to match the observed behaviour: ; Use of this INI entry is DEPRECATED, use global internal_encoding instead. ; internal/script encoding. ; Some encoding cannot work as internal encoding, e.g. SJIS, BIG5, ISO-2022-* ; If empty, default_charset or internal_encoding or iconv.internal_encoding is ; used. The precedence is: ; default_charset < internal_encoding < iconv.internal_encoding ;;mbstring.internal_encoding = First, "iconv.internal_encoding" seems to be a copy&paste error, I assume. I guess it should be "mbstring.internal_encoding", right? Second, the value I can read with mb_internal_encoding() or the behaviour I see, when I for example use implicit conversion via zend.multibyte=On or an explicit conversion via mb_convert_encoding() without using a source encoding as a third parameter suggests, that the value does not change unless it is directly set in php.ini or via ini_set(). Setting default_charset or internal_encoding in php.ini does NOT implicitly change mbstring's internal encoding value in the PHP versions, mentioned above (for example tested with ISO-8859-1 and ISO-8859-15). Have I missed something in the docs or is this an unintended behaviour? It is hard to believe, that this might be some system-specific issue for my self-compiled versions, but it is also strange that all three major versions behave in the same way. (Maybe I have missed something.) In contrast, iconv behaves like the comments in the php.ini suggest. With default_charset = A, all three iconv encodings implicitly change to A, if I additionally set internal_encoding = B, also iconv.internal_encoding changes to B, whereas the remaining two still use A. As far as I can see it right now, mbstring's internal encoding can currently only be changed directly. It currently uses UTF-8 for all my PHP versions. What is the source for this default value? Is it the implicit DEFAULT value of default_charset or is it the system locale, which also uses UTF-8 on my system? Nevertheless, the options usage seems to be inconsistent. -------- I would expect, that changing/setting default_charset and/or internal_encoding in php.ini would also affect mbstring's internal default encoding (as the comments suggest and as it does for iconv). Additionally it's a shame that the ini-comments provide more "insights" then the official documentation. Note 1: It would be nice, if phpinfo() would display inherited values instead of "no value" in the table (for e.g. internal_encoding, mbstring.internal_encoding, etc.). Seeing effective values would be helpful. For example "<UTF-8>" to mark, that the value has not been explicitly set in php.ini, but instead has been derived from another option in the inheritance chain. (The HTML-output also could use CSS for special formatting.) Note 2: Option default_mimetype does take its additonal charset information ("Content-Type: text/html; charset=utf-8") only from option default_charset. With option output_encoding being available, it would be more straightforward to use this one as the primary source, which implicitly defaults to default_charset (if it is not explicitly set). Best regards, GertTested the behavior for self-compiled PHP 7.0.23, 7.1.9 and 7.2.0RC2 and it's the same problem (on Debian 8 x64). Default or set values on default_charset and internal_encoding are not handed down to mbstring.internal_encoding. It remains on its default value 'UTF-8' unless it is changed directly. On the other hand, changes are correctly reflected by iconv encodings. Example: <?php header('Content-Type: text/plain'); header('Cache-Control: no-cache, no-store, must-revalidate'); error_reporting(E_ALL); echo "mbstring internal encoding inheritance test: PHP " . PHP_VERSION . date(' , Y-m-d H:i:s') . "\n"; echo "\nBEFORE\n"; echo "default_charset : '" . ini_get('default_charset') . "'\n"; echo "input_encoding : '" . ini_get('input_encoding') . "'\n"; echo "internal_encoding : '" . ini_get('internal_encoding') . "'\n"; echo "output_encoding : '" . ini_get('output_encoding') . "'\n"; echo "mb_internal_encoding(): '" . mb_internal_encoding() . "'\n"; echo "iconv_get_encoding() :\n" . var_export(iconv_get_encoding('all'), TRUE) . "\n"; sleep(1); ini_set('default_charset', 'Windows-1252'); ini_set('output_encoding', 'ISO-8859-15'); echo "\nAFTER\n"; echo "default_charset : '" . ini_get('default_charset') . "'\n"; echo "input_encoding : '" . ini_get('input_encoding') . "'\n"; echo "internal_encoding : '" . ini_get('internal_encoding') . "'\n"; echo "output_encoding : '" . ini_get('output_encoding') . "'\n"; echo "mb_internal_encoding(): '" . mb_internal_encoding() . "'\n"; echo "iconv_get_encoding() :\n" . var_export(iconv_get_encoding('all'), TRUE) . "\n"; ?> All version have been compiled with option --enable-mbstring, whereas iconv is enabled by default. For this test, default_charset = "GB18030" has been set in php.ini and mbstring.internal_encoding and all three iconv encoding options remains empty/unset in php.ini: mbstring internal encoding inheritance test: PHP 7.2.0RC2 , 2017-09-27 08:47:18 BEFORE default_charset : 'GB18030' input_encoding : '' internal_encoding : '' output_encoding : '' mb_internal_encoding(): 'UTF-8' iconv_get_encoding() : array ( 'input_encoding' => 'GB18030', 'output_encoding' => 'GB18030', 'internal_encoding' => 'GB18030', ) AFTER default_charset : 'Windows-1252' input_encoding : '' internal_encoding : '' output_encoding : 'ISO-8859-15' mb_internal_encoding(): 'UTF-8' iconv_get_encoding() : array ( 'input_encoding' => 'Windows-1252', 'output_encoding' => 'ISO-8859-15', 'internal_encoding' => 'Windows-1252', ) It is the same result for all three PHP versions. Additionally setting internal_encoding directly in php.ini does not implicitly change mbstring.internal_encoding.POSSIBLE ISSUE with FPM ? (permanent encoding changes) (Re)starting FPM instace (compiled w/o OPcache and no other instance or version running) and only setting default_charset = "GB18030" in php.ini returns the output above with initially ("BEFORE") 'iconv.output_encoding' => 'GB18030' (as expected). If I reload the page, even multiple times, I now get ("BEFORE") 'iconv.output_encoding' => 'ISO-8859-15' for every following request, until I perform a systemctl reload or systemctl restart for this FPM instance. --> It seems as if the FPM instance does permanently store indirect in-script changes to iconv encodings (and not only for the duration of the script). The behavior is replicable. If php.ini uses 'Windows-1251' instead, the first requests returns this for 'iconv.output_encoding', but following requests always return 'ISO-8859-15' in "BEFORE" section. If I extend the test script and also set 'internal_encoding' and 'input_encoding' explicitly, also these changes are handed over to iconv and are permanently stored for this FPM instance. The problem is also independed from the browser. e.g. the first request shows the initial values in Chrome, the second request, initiated via Firefox, now shows altered initial values for all three iconv encodings. I noted this behavior in PHP 7.0.23, 7.1.9 and also 7.2.0RC2. Additionally self-compiled PHP 5.6.31 behaves more strangely. It also seems to permanently store inherited iconv changes in its FPM instance, but the initially "BEFORE" iconv values seem to change randomly with every following request and additionally display corrupted (non-ascii) strings in the output. (The "AFTER" values are displayed correctly.) --> On the other hand, the usage of deprecated method iconv_set_encoding() does not result in a permanent storage within the FPM instance; changes are only active for the duration of the script (as it should be). It seems the code, propagating values from default_charset and the *_encoding ini-options to iconv replaces the initialy read, global php.ini settings in the FPM instance instead of only modifying the script's runtime values. --> For PHP 5.6.31 it seems, that the already invalid permanent storage additionally uses a malformed C-code to copy/replace the encoding name strings. It seems as if the code does not properly recognize the beginning of the strings and starts copying in the middel of the strings or even copies from some random/invalid pointer?! --> Also note, that permanent storage in FPM instances also might affect mbsting.internal_encoding value inheritance, IF this functionality WOULD work as it should (if the code is the same as for iconv and has been copied and only adjusted). Finally I want to mention, that I cannot ensure, that this observed behavior is not connected to my specific system. All PHP versions are compiled mostly statically, resulting in an approx. 40 MB binary. PHP 5.6.31 and 7.0.23 are compiled against updated distro-packages only, PHP 7.1.9 and 7.2.0RC2 use additionally custom-build libxml 2.9.4, libxslt 1.1.29 and tidy 5.4.0 and 7.2.0RC2 additionally uses libargon2 20161029, libsodium 1.0.13 and libzip 1.3.0, but I cannot imagine how any of these libs could affect this behavior. Best regards, Gert