php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #49638 mbstring default behaviour is unexpected
Submitted: 2009-09-23 09:43 UTC Modified: 2017-10-24 05:44 UTC
Votes:2
Avg. Score:4.0 ± 1.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: olivier at oxeva dot fr Assigned: cataphract (profile)
Status: Assigned Package: *General Issues
PHP Version: 5.3SVN-2009-09-23 (snap) OS: Linux
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: olivier at oxeva dot fr
New email:
PHP Version: OS:

 

 [2009-09-23 09:43 UTC] olivier at oxeva dot fr
Description:
------------
Default behaviour of mbstring concerning php files encoding is unexpected: If the source file is in UTF-8 (with BOM), the file is converted to latin1 (mbstring.internal_encoding has default value).


Reproduce code:
---------------
Here are all tests done on a file named "testmbstring.php"

http://www.ajeux.com/phptests/testmbstring.phps (md5sum: e663d28964a20ec404e68226effc27d0)


1/ PHP 5.3.0 WITHOUT mbstring
'./configure'  '--enable-zend-multibyte'
(on a file without the var_dump())
Hexdump:
00000000  31 32 33 c3 a9 31 32 33  e2 82 ac 31 32 33        |123..123...123|
0000000e

=> result is OK, non-latin1 chars are coded on 2 bytes and 3 bytes


2/ PHP 5.3.0 WITH mbstring
'./configure'  '--enable-zend-multibyte' '--enable-mbstring'

  a) no specific parameters on .ini
php -i returned the following:
mbstring.detect_order => no value => no value
mbstring.encoding_translation => Off => Off
mbstring.func_overload => 0 => 0
mbstring.http_input => pass => pass
mbstring.http_output => pass => pass
mbstring.http_output_conv_mimetypes => ^(text/|application/xhtml\+xml) => ^(text/|application/xhtml\+xml)
mbstring.internal_encoding => no value => no value
mbstring.language => neutral => neutral
mbstring.script_encoding => no value => no value
mbstring.strict_detection => Off => Off
mbstring.substitute_character => no value => no value

Hexdump:
00000000  73 74 72 69 6e 67 28 31  30 29 20 22 49 53 4f 2d  |string(10) "ISO-|
00000010  38 38 35 39 2d 31 22 0a  31 32 33 e9 31 32 33 3f  |8859-1".123.123?|
00000020  31 32 33                                          |123|
00000023


=> Default behaviour is to have latin1 as internal_encoding. Characters are no longer UTF-8


   b) With mbstring.internal_encoding="UTF-8"
php -i returned the following:
mbstring.detect_order => no value => no value
mbstring.encoding_translation => Off => Off
mbstring.func_overload => 0 => 0
mbstring.http_input => pass => pass
mbstring.http_output => pass => pass
mbstring.http_output_conv_mimetypes => ^(text/|application/xhtml\+xml) => ^(text/|application/xhtml\+xml)
mbstring.internal_encoding => UTF-8 => UTF-8
mbstring.language => neutral => neutral
mbstring.script_encoding => no value => no value
mbstring.strict_detection => Off => Off
mbstring.substitute_character => no value => no value

Hexdump output:
00000000  73 74 72 69 6e 67 28 35  29 20 22 55 54 46 2d 38  |string(5) "UTF-8|
00000010  22 0a 31 32 33 c3 a9 31  32 33 e2 82 ac 31 32 33  |".123..123...123|
00000020

=> OK


   c) With var_dump(mb_internal_encoding('UTF-8')) on top of file (and no changes in php.ini)
php -i : output == 2a.

00000000  62 6f 6f 6c 28 74 72 75  65 29 0a 73 74 72 69 6e  |bool(true).strin|
00000010  67 28 35 29 20 22 55 54  46 2d 38 22 0a 31 32 33  |g(5) "UTF-8".123|
00000020  e9 31 32 33 3f 31 32 33                           |.123?123|
00000028


=> In spite of the mb_internal_encoding(), the output is not utf-8.





Expected result:
----------------
MBString should not change the source file encoding if there is no default internal_encoding specified by the user (mbstring.internal_encoding => no value).

If this is expected, at least phpinfo() (php -i) should show to the user that default internal_encoding is latin1 (ISO-8859-1).


Also, the mb_internal_encoding('UTF-8') function should work on current file (see test 2c). User should not be forced to change internal_encoding through php.ini (all users do not have access to it).


Actual result:
--------------
Versions tested with same behaviour : 
- PHP 5.3.0
- PHP 5.3.0 snap 200909230830
- PHP 5.2.3
- PHP 5.2.11

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-09-23 14:11 UTC] olivier at oxeva dot fr
Just to be clear : 
When I said "file is converted to latin1" (bug description), I did not mean the file is rewritten on disk. I'm saying the _output_ is in latin1.
 [2010-11-19 01:46 UTC] cataphract@php.net
-Status: Open +Status: Feedback -Package: Feature/Change Request +Package: *General Issues
 [2010-11-19 01:46 UTC] cataphract@php.net
The test file is no longer online. Can you provide it?
 [2010-11-19 12:00 UTC] olivier at oxeva dot fr
Hi,
Test has been restored :)

Olivier
 [2010-11-23 04:53 UTC] cataphract@php.net
-Status: Feedback +Status: Verified -Assigned To: +Assigned To: cataphract
 [2010-11-23 05:05 UTC] cataphract@php.net
-Type: Feature/Change Request +Type: Bug
 [2010-11-23 05:05 UTC] cataphract@php.net
Changing this to bug.
 [2013-12-05 19:44 UTC] mike@php.net
Why is this a "General Issue" instead of mbstring?
 [2017-10-24 05:44 UTC] kalle@php.net
-Status: Verified +Status: Assigned
 
PHP Copyright © 2001-2017 The PHP Group
All rights reserved.
Last updated: Sun Nov 19 01:31:42 2017 UTC