php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #60306 Characters lost while converting from cp936 to utf8
Submitted: 2011-11-15 07:29 UTC Modified: 2011-11-18 08:52 UTC
From: laruence@php.net Assigned: laruence (profile)
Status: Closed Package: mbstring related
PHP Version: 5.4.0RC1 OS:
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: laruence@php.net
New email:
PHP Version: OS:

 

 [2011-11-15 07:29 UTC] laruence@php.net
Description:
------------
same script, same ini,  5.4 result a wrong result .





Test script:
---------------
<?php
declare(encoding="cp936");
$s = "洪仁玕";
var_dump($s);
?>

save the test script in  fenc=cp936. 

run:
php53 -dmbstring.internal_encoding=utf8 test.php 
and
php54 -dmbstring.internal_encoding=utf8 test.php 

Expected result:
----------------
string(9) "洪仁玕"

Actual result:
--------------
5.3 works fine. 

but 5.4 output:
string(3) "洪"

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2011-11-15 07:37 UTC] laruence@php.net
dmitry, plz look at this, thanks :)
 [2011-11-15 07:37 UTC] laruence@php.net
-Assigned To: +Assigned To: dmitry
 [2011-11-15 08:27 UTC] laruence@php.net
if I change the script to:
<?php
declare(encoding="gb2312");
$s = "洪仁玕";
var_dump($s);
?>

also set fenc=cp936, then a memory leak will be reported:

$ php54 -d mbstring.internal_encoding=utf8 -dzend.script_encoding=cp936 
/tmp/1.php 
string(7) "洪仁?"
[Tue Nov 15 16:26:36 2011]  Script:  '/tmp/1.php'
***php-src/trunk/ext/mbstring/mbstring.c(612) :  Freeing 0x2A95DDDF68 (131 
bytes), script=/tmp/1.php
=== Total 1 memory leaks detected ===
 [2011-11-15 08:28 UTC] laruence@php.net
the same script as above, will trigger a abort in this way:

$php54 -d mbstring.internal_encoding=cp936 /tmp/1.php 

php: Zend/zend_language_scanner.l:126: encoding_filter_script_to_internal: 
Assertion `internal_encoding && 
zend_multibyte_check_lexer_compatibility(internal_encoding)' failed.
Aborted (core dumped)
 [2011-11-17 09:04 UTC] laruence@php.net
seems the characters is lost in the mbfl_buffer_converter_feed2 (called in 
zend_multibyte_encoding_converter)
 [2011-11-18 05:05 UTC] laruence@php.net
actully, there is a more simple reproduce script:
<?php
$s = "洪仁";
var_dump(mb_convert_encoding($s, "utf8", "gbk"));
?>

save script in fenc=cp936

than, in php 5.4 this result a :
string(3) "洪"

characters lost while converting from cp936 to utf8. this is really a big 
problem.....
 [2011-11-18 08:40 UTC] laruence@php.net
-Assigned To: dmitry +Assigned To: laruence
 [2011-11-18 08:40 UTC] laruence@php.net
I have make a fix, re-assign to me, thanks dmitry ;)
 [2011-11-18 08:45 UTC] laruence@php.net
-Summary: zend-multibyte failed in 5.4 +Summary: Characters lost while converting from cp936 to utf8
 [2011-11-18 08:50 UTC] laruence@php.net
Automatic comment from SVN on behalf of laruence
Revision: http://svn.php.net/viewvc/?view=revision&amp;revision=319452
Log: Fixed bug #60306 (Characters lost while converting from cp936 to utf8)
 [2011-11-18 08:52 UTC] laruence@php.net
This bug has been fixed in SVN.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.

 For Windows:

http://windows.php.net/snapshots/
 
Thank you for the report, and for helping us make PHP better.


 [2011-11-18 08:52 UTC] laruence@php.net
-Status: Assigned +Status: Closed
 [2012-04-18 09:47 UTC] laruence@php.net
Automatic comment on behalf of laruence
Revision: http://git.php.net/?p=php-src.git;a=commit;h=601407aa25712a47ae4d59e2046d822fe93eb501
Log: Fixed bug #60306 (Characters lost while converting from cp936 to utf8)
 [2012-07-24 23:38 UTC] rasmus@php.net
Automatic comment on behalf of laruence
Revision: http://git.php.net/?p=php-src.git;a=commit;h=601407aa25712a47ae4d59e2046d822fe93eb501
Log: Fixed bug #60306 (Characters lost while converting from cp936 to utf8)
 [2013-11-17 09:35 UTC] laruence@php.net
Automatic comment on behalf of laruence
Revision: http://git.php.net/?p=php-src.git;a=commit;h=601407aa25712a47ae4d59e2046d822fe93eb501
Log: Fixed bug #60306 (Characters lost while converting from cp936 to utf8)
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Dec 03 17:01:29 2024 UTC