php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #40664 String conversion functions wrong for multibyte chars
Submitted: 2007-02-28 11:38 UTC Modified: 2020-02-03 08:55 UTC
Votes:138
Avg. Score:4.9 ± 0.3
Reproduced:39 of 39 (100.0%)
Same Version:9 (23.1%)
Same OS:35 (89.7%)
From: fjortiz at comunet dot es Assigned: cmb (profile)
Status: Duplicate Package: COM related
PHP Version: 5.2.1 OS: Win32
Private report: No CVE-ID: None
 [2007-02-28 11:38 UTC] fjortiz at comunet dot es
Description:
------------
when converting a UTF-8 encoded, multibyte string (Russian for example), to a COM string, a wrong number of bytes are allocated, thus creating the COM string with junk bytes at the end.

For example, when I pass my COM-based ADODB driver a 5-letter word in Russian, I get at destination a 10 (5*2) characters string, the first 5 are the right Russian chars and the rest 5 are junk characters.

This was also explained for 4.4.2 in bug #37899


Actual result:
--------------
this is solved patching two files:
\ext\com_dotnet\com_variant.c, function php_com_variant_from_zval, line 156:

156,157c156
< 			V_BSTR(v) = SysAllocString((char*)olestring);
---
> 			V_BSTR(v) = SysAllocStringByteLen((char*)olestring, Z_STRLEN_P(z) * sizeof(OLECHAR));

\ext\com_dotnet\com_olechar.c, function php_com_string_to_olestring:

37d36
< 	uint unicode_strlen;
39,40c38,44
< 	unicode_strlen = MultiByteToWideChar(codepage, (codepage == CP_UTF8 ?
< 		0 : MB_PRECOMPOSED | MB_ERR_INVALID_CHARS), string, -1, NULL, 0);
---
> 	if (string_len == -1) {
> 		/* determine required length for the buffer (includes NUL terminator) */
> 		string_len = MultiByteToWideChar(codepage, flags, string, -1, NULL, 0);
> 	} else {
> 		/* allow room for NUL terminator */
> 		string_len++;
> 	}
42,44c46,48
< 	if (unicode_strlen > 0) {
< 		olestring = (OLECHAR*)safe_emalloc(sizeof(OLECHAR), unicode_strlen, 0);
< 		ok = MultiByteToWideChar(codepage, flags, string, -1, olestring, unicode_srlen);
---
> 	if (strlen > 0) {
> 		olestring = (OLECHAR*)safe_emalloc(sizeof(OLECHAR), string_len, 0);
> 		ok = MultiByteToWideChar(codepage, flags, string, string_len, olestring, string_len);



Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2007-05-23 11:51 UTC] fjortiz at comunet dot es
Any updates about this?. I'm showing you the code so fix the issue. Please tell me what else you may need to carry this into the main tree.

TIA
 [2007-05-28 15:07 UTC] tony2001@php.net
Unfortunately the extension is currently unmaintained, so I don't think anybody is going to commit your patch in the nearest future.

 [2007-07-25 14:28 UTC] ameoba32 at gmail dot com
i found this bug too, so code is like this:

$word = new COM("Word.Application", null, CP_UTF8);
$word->Visible = true;
$doc = $word->Documents->Add();
$word->Selection->TypeText( "UTF8 text here, russian in my case" );

In word appears "Normal text " + garbage.

can this be fixed in CVS ? i have to compile php on win32 by myself ;)
 [2007-08-15 08:36 UTC] jani@php.net
Assigned to the maintainer.
 [2008-10-22 14:05 UTC] vincent at eal dot com
Any update on this bug? I still have the problem with PHP 5.2.3 and the changelog does not indicate any fix in 5.2.6.

It is very annoying to have a nice patch standing there still with no one to commit it to CVS.
 [2009-03-22 10:12 UTC] j dot novak at netsystem dot cz
I see same problem with ADODB.Connection and it's parameters.
When I use:

// init connection to DB
$dbc = new COM('ADODB.Connection',null,CP_UTF8);
$dbc->Open("PROVIDER=MSDASQL;Driver={SQL Server};Server=192.168.210.1;Database=test;UID=test;PWD=test");
// create command
$oCmd = new COM('ADODB.Command',null,CP_UTF8);
$oCmd->ActiveConnection = $dbc; // 
// Table test1 has one row c1 of type nvarchar(200)
$oCmd->CommandText = "INSERT INTO test1(c1) VALUES (?)";
$oCmd->CommandType = 1;
// Some UTF-8 string (length in characters is 15, 24 in bytes)
$val='ABCříšúěďáéóXYZ';
$len=strlen($val);
$p=$oCmd->CreateParameter('name',202,1,$len,$val);
$oCmd->Parameters->Append($p);
$oCmd->Execute();
// ADODB sends to DB nvarchar(24) which is length of string
// in bytes not characters, but data has 15 characters
// in UCS-2. So in database there is correct string, but there is
// some garbage after end of string
 [2009-10-20 10:36 UTC] fjortiz at comunet dot es
after 2,5 years, can someone just commit the patch I provided to cvs?. It's annoying to recompile any PHP version that comes out just for the sake of this nonsense. Take into account that any use of PHP+COM+multibyte stuff fails without this silly fix.

Thank you.
 [2009-10-20 10:50 UTC] pajoye@php.net
taking the hand on this one.
 [2010-08-01 17:45 UTC] ivars_ju at inbox dot lv
Hi,
i still have this bug in PHP Version 5.3.3

Is it so hard to commit this?
 [2012-08-08 07:24 UTC] sager at agitos dot de
The patch is still missing in PHP 5.3.15 (I don't know why, the patch is working: thanks to fjortiz!)

So I compiled php_com_dotnet.dll that is compatible to http://windows.php.net/downloads/releases/php-5.3.15-nts-Win32-VC9-x86.msi

Download this compiled dll at http://www.agitos.de/pub/php-5.3.15-com_dotnet-patch.zip
 [2012-11-06 17:23 UTC] ddowns at online-access dot com
I went to apply the patch and its already there in 5.3.18. Not sure when this was 
actually solved but the bug status needs updating.
 [2014-07-17 07:09 UTC] fjortiz at comunet dot es
Some update: while working with 5.3.28, it seems to work better now, but there is still a length-related bug inside.

While inserting from PHP strings with accents or spanish "ñ" char in MSSQL server through COM driver (utf8 encoded), it seems to insert properly, but if I inspect the last char in the column I find a "\0" at the last position, which makes all my "=" queries fail. 

"\0" is the C language string terminator so there is clearly some kind of memory allocation mismatch in the code. 

Oddly enough queries with "where column='textwithñfails'+char(0)" do work, so I'll use this trick as a workaround meanwhile.
 [2017-10-24 07:30 UTC] kalle@php.net
-Status: Assigned +Status: Open -Assigned To: pajoye +Assigned To:
 [2020-02-03 08:55 UTC] cmb@php.net
-Status: Open +Status: Duplicate -Assigned To: +Assigned To: cmb
 [2020-02-03 08:55 UTC] cmb@php.net
This appears to be a duplicate of bug #66431, which is fixed as of
PHP 5.4.29 and 5.5.13, respectively.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Dec 03 17:01:29 2024 UTC