php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #49638 mbstring default behaviour is unexpected
Submitted: 2009-09-23 09:43 UTC Modified: 2021-01-31 04:22 UTC
Votes:7
Avg. Score:4.1 ± 1.0
Reproduced:4 of 5 (80.0%)
Same Version:2 (50.0%)
Same OS:2 (50.0%)
From: olivier at oxeva dot fr Assigned: cmb (profile)
Status: No Feedback Package: mbstring related
PHP Version: 5.3SVN-2009-09-23 (snap) OS: Linux
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: olivier at oxeva dot fr
New email:
PHP Version: OS:

 

 [2009-09-23 09:43 UTC] olivier at oxeva dot fr
Description:
------------
Default behaviour of mbstring concerning php files encoding is unexpected: If the source file is in UTF-8 (with BOM), the file is converted to latin1 (mbstring.internal_encoding has default value).


Reproduce code:
---------------
Here are all tests done on a file named "testmbstring.php"

http://www.ajeux.com/phptests/testmbstring.phps (md5sum: e663d28964a20ec404e68226effc27d0)


1/ PHP 5.3.0 WITHOUT mbstring
'./configure'  '--enable-zend-multibyte'
(on a file without the var_dump())
Hexdump:
00000000  31 32 33 c3 a9 31 32 33  e2 82 ac 31 32 33        |123..123...123|
0000000e

=> result is OK, non-latin1 chars are coded on 2 bytes and 3 bytes


2/ PHP 5.3.0 WITH mbstring
'./configure'  '--enable-zend-multibyte' '--enable-mbstring'

  a) no specific parameters on .ini
php -i returned the following:
mbstring.detect_order => no value => no value
mbstring.encoding_translation => Off => Off
mbstring.func_overload => 0 => 0
mbstring.http_input => pass => pass
mbstring.http_output => pass => pass
mbstring.http_output_conv_mimetypes => ^(text/|application/xhtml\+xml) => ^(text/|application/xhtml\+xml)
mbstring.internal_encoding => no value => no value
mbstring.language => neutral => neutral
mbstring.script_encoding => no value => no value
mbstring.strict_detection => Off => Off
mbstring.substitute_character => no value => no value

Hexdump:
00000000  73 74 72 69 6e 67 28 31  30 29 20 22 49 53 4f 2d  |string(10) "ISO-|
00000010  38 38 35 39 2d 31 22 0a  31 32 33 e9 31 32 33 3f  |8859-1".123.123?|
00000020  31 32 33                                          |123|
00000023


=> Default behaviour is to have latin1 as internal_encoding. Characters are no longer UTF-8


   b) With mbstring.internal_encoding="UTF-8"
php -i returned the following:
mbstring.detect_order => no value => no value
mbstring.encoding_translation => Off => Off
mbstring.func_overload => 0 => 0
mbstring.http_input => pass => pass
mbstring.http_output => pass => pass
mbstring.http_output_conv_mimetypes => ^(text/|application/xhtml\+xml) => ^(text/|application/xhtml\+xml)
mbstring.internal_encoding => UTF-8 => UTF-8
mbstring.language => neutral => neutral
mbstring.script_encoding => no value => no value
mbstring.strict_detection => Off => Off
mbstring.substitute_character => no value => no value

Hexdump output:
00000000  73 74 72 69 6e 67 28 35  29 20 22 55 54 46 2d 38  |string(5) "UTF-8|
00000010  22 0a 31 32 33 c3 a9 31  32 33 e2 82 ac 31 32 33  |".123..123...123|
00000020

=> OK


   c) With var_dump(mb_internal_encoding('UTF-8')) on top of file (and no changes in php.ini)
php -i : output == 2a.

00000000  62 6f 6f 6c 28 74 72 75  65 29 0a 73 74 72 69 6e  |bool(true).strin|
00000010  67 28 35 29 20 22 55 54  46 2d 38 22 0a 31 32 33  |g(5) "UTF-8".123|
00000020  e9 31 32 33 3f 31 32 33                           |.123?123|
00000028


=> In spite of the mb_internal_encoding(), the output is not utf-8.





Expected result:
----------------
MBString should not change the source file encoding if there is no default internal_encoding specified by the user (mbstring.internal_encoding => no value).

If this is expected, at least phpinfo() (php -i) should show to the user that default internal_encoding is latin1 (ISO-8859-1).


Also, the mb_internal_encoding('UTF-8') function should work on current file (see test 2c). User should not be forced to change internal_encoding through php.ini (all users do not have access to it).


Actual result:
--------------
Versions tested with same behaviour : 
- PHP 5.3.0
- PHP 5.3.0 snap 200909230830
- PHP 5.2.3
- PHP 5.2.11

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-09-23 14:11 UTC] olivier at oxeva dot fr
Just to be clear : 
When I said "file is converted to latin1" (bug description), I did not mean the file is rewritten on disk. I'm saying the _output_ is in latin1.
 [2010-11-19 01:46 UTC] cataphract@php.net
-Status: Open +Status: Feedback -Package: Feature/Change Request +Package: *General Issues
 [2010-11-19 01:46 UTC] cataphract@php.net
The test file is no longer online. Can you provide it?
 [2010-11-19 12:00 UTC] olivier at oxeva dot fr
Hi,
Test has been restored :)

Olivier
 [2010-11-23 04:53 UTC] cataphract@php.net
-Status: Feedback +Status: Verified -Assigned To: +Assigned To: cataphract
 [2010-11-23 05:05 UTC] cataphract@php.net
-Type: Feature/Change Request +Type: Bug
 [2010-11-23 05:05 UTC] cataphract@php.net
Changing this to bug.
 [2013-12-05 19:44 UTC] mike@php.net
Why is this a "General Issue" instead of mbstring?
 [2017-10-24 05:44 UTC] kalle@php.net
-Status: Verified +Status: Assigned
 [2018-03-26 13:35 UTC] cmb@php.net
-Package: *General Issues +Package: mbstring related
 [2021-01-18 21:48 UTC] sji at sj-i dot dev
Couldn't reproduce this on 7.4
 [2021-01-19 14:32 UTC] cmb@php.net
-Status: Assigned +Status: Feedback -Assigned To: cataphract +Assigned To: cmb
 [2021-01-19 14:32 UTC] cmb@php.net
I think this is no longer the case as of PHP 5.4.0, or can anybody
still reproduce this issue with any of the actively supported PHP
versions[1]?

[1] <https://www.php.net/supported-versions.php>
 [2021-01-31 04:22 UTC] php-bugs at lists dot php dot net
No feedback was provided. The bug is being suspended because
we assume that you are no longer experiencing the problem.
If this is not the case and you are able to provide the
information that was requested earlier, please do so and
change the status of the bug back to "Re-Opened". Thank you.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Mar 19 05:01:29 2024 UTC