php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #60306 Characters lost while converting from cp936 to utf8
Submitted: 2011-11-15 07:29 UTC Modified: 2011-11-18 08:52 UTC
From: laruence@php.net Assigned: laruence (profile)
Status: Closed Package: mbstring related
PHP Version: 5.4.0RC1 OS:
Private report: No CVE-ID: None
 [2011-11-15 07:29 UTC] laruence@php.net
Description:
------------
same script, same ini,  5.4 result a wrong result .





Test script:
---------------
<?php
declare(encoding="cp936");
$s = "洪仁玕";
var_dump($s);
?>

save the test script in  fenc=cp936. 

run:
php53 -dmbstring.internal_encoding=utf8 test.php 
and
php54 -dmbstring.internal_encoding=utf8 test.php 

Expected result:
----------------
string(9) "洪仁玕"

Actual result:
--------------
5.3 works fine. 

but 5.4 output:
string(3) "洪"

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2011-11-15 07:37 UTC] laruence@php.net
dmitry, plz look at this, thanks :)
 [2011-11-15 07:37 UTC] laruence@php.net
-Assigned To: +Assigned To: dmitry
 [2011-11-15 08:27 UTC] laruence@php.net
if I change the script to:
<?php
declare(encoding="gb2312");
$s = "洪仁玕";
var_dump($s);
?>

also set fenc=cp936, then a memory leak will be reported:

$ php54 -d mbstring.internal_encoding=utf8 -dzend.script_encoding=cp936 
/tmp/1.php 
string(7) "洪仁?"
[Tue Nov 15 16:26:36 2011]  Script:  '/tmp/1.php'
***php-src/trunk/ext/mbstring/mbstring.c(612) :  Freeing 0x2A95DDDF68 (131 
bytes), script=/tmp/1.php
=== Total 1 memory leaks detected ===
 [2011-11-15 08:28 UTC] laruence@php.net
the same script as above, will trigger a abort in this way:

$php54 -d mbstring.internal_encoding=cp936 /tmp/1.php 

php: Zend/zend_language_scanner.l:126: encoding_filter_script_to_internal: 
Assertion `internal_encoding && 
zend_multibyte_check_lexer_compatibility(internal_encoding)' failed.
Aborted (core dumped)
 [2011-11-17 09:04 UTC] laruence@php.net
seems the characters is lost in the mbfl_buffer_converter_feed2 (called in 
zend_multibyte_encoding_converter)
 [2011-11-18 05:05 UTC] laruence@php.net
actully, there is a more simple reproduce script:
<?php
$s = "洪仁";
var_dump(mb_convert_encoding($s, "utf8", "gbk"));
?>

save script in fenc=cp936

than, in php 5.4 this result a :
string(3) "洪"

characters lost while converting from cp936 to utf8. this is really a big 
problem.....
 [2011-11-18 08:40 UTC] laruence@php.net
-Assigned To: dmitry +Assigned To: laruence
 [2011-11-18 08:40 UTC] laruence@php.net
I have make a fix, re-assign to me, thanks dmitry ;)
 [2011-11-18 08:45 UTC] laruence@php.net
-Summary: zend-multibyte failed in 5.4 +Summary: Characters lost while converting from cp936 to utf8
 [2011-11-18 08:50 UTC] laruence@php.net
Automatic comment from SVN on behalf of laruence
Revision: http://svn.php.net/viewvc/?view=revision&amp;revision=319452
Log: Fixed bug #60306 (Characters lost while converting from cp936 to utf8)
 [2011-11-18 08:52 UTC] laruence@php.net
This bug has been fixed in SVN.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.

 For Windows:

http://windows.php.net/snapshots/
 
Thank you for the report, and for helping us make PHP better.


 [2011-11-18 08:52 UTC] laruence@php.net
-Status: Assigned +Status: Closed
 [2012-04-18 09:47 UTC] laruence@php.net
Automatic comment on behalf of laruence
Revision: http://git.php.net/?p=php-src.git;a=commit;h=601407aa25712a47ae4d59e2046d822fe93eb501
Log: Fixed bug #60306 (Characters lost while converting from cp936 to utf8)
 [2012-07-24 23:38 UTC] rasmus@php.net
Automatic comment on behalf of laruence
Revision: http://git.php.net/?p=php-src.git;a=commit;h=601407aa25712a47ae4d59e2046d822fe93eb501
Log: Fixed bug #60306 (Characters lost while converting from cp936 to utf8)
 [2013-11-17 09:35 UTC] laruence@php.net
Automatic comment on behalf of laruence
Revision: http://git.php.net/?p=php-src.git;a=commit;h=601407aa25712a47ae4d59e2046d822fe93eb501
Log: Fixed bug #60306 (Characters lost while converting from cp936 to utf8)
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 15:01:30 2024 UTC