PHP :: Bug #40664 :: String conversion functions wrong for multibyte chars

Bug #40664

String conversion functions wrong for multibyte chars

Submitted:

2007-02-28 11:38 UTC

Modified:

2020-02-03 08:55 UTC

Votes:	138
Avg. Score:	4.9 ± 0.3
Reproduced:	39 of 39 (100.0%)
Same Version:	9 (23.1%)
Same OS:	35 (89.7%)

From:

fjortiz at comunet dot es

Assigned:

cmb (profile)

Status:

Duplicate

Package:

COM related

PHP Version:

5.2.1

OS:

Win32

Private report:

CVE-ID:

None

View Developer Edit

Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.

php.net Username: php.net Password:

Quick Fix:	(description)
	Block user comment
Status:		Assign to:
Package:
Bug Type:
Summary:
From:	fjortiz at comunet dot es
New email:
PHP Version:		OS:

New/Additional Comment:

[2007-02-28 11:38 UTC] fjortiz at comunet dot es

Description:
------------
when converting a UTF-8 encoded, multibyte string (Russian for example), to a COM string, a wrong number of bytes are allocated, thus creating the COM string with junk bytes at the end.

For example, when I pass my COM-based ADODB driver a 5-letter word in Russian, I get at destination a 10 (5*2) characters string, the first 5 are the right Russian chars and the rest 5 are junk characters.

This was also explained for 4.4.2 in bug #37899


Actual result:
--------------
this is solved patching two files:
\ext\com_dotnet\com_variant.c, function php_com_variant_from_zval, line 156:

156,157c156
< 			V_BSTR(v) = SysAllocString((char*)olestring);
---
> 			V_BSTR(v) = SysAllocStringByteLen((char*)olestring, Z_STRLEN_P(z) * sizeof(OLECHAR));

\ext\com_dotnet\com_olechar.c, function php_com_string_to_olestring:

37d36
< 	uint unicode_strlen;
39,40c38,44
< 	unicode_strlen = MultiByteToWideChar(codepage, (codepage == CP_UTF8 ?
< 		0 : MB_PRECOMPOSED | MB_ERR_INVALID_CHARS), string, -1, NULL, 0);
---
> 	if (string_len == -1) {
> 		/* determine required length for the buffer (includes NUL terminator) */
> 		string_len = MultiByteToWideChar(codepage, flags, string, -1, NULL, 0);
> 	} else {
> 		/* allow room for NUL terminator */
> 		string_len++;
> 	}
42,44c46,48
< 	if (unicode_strlen > 0) {
< 		olestring = (OLECHAR*)safe_emalloc(sizeof(OLECHAR), unicode_strlen, 0);
< 		ok = MultiByteToWideChar(codepage, flags, string, -1, olestring, unicode_srlen);
---
> 	if (strlen > 0) {
> 		olestring = (OLECHAR*)safe_emalloc(sizeof(OLECHAR), string_len, 0);
> 		ok = MultiByteToWideChar(codepage, flags, string, string_len, olestring, string_len);

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports

[2007-05-23 11:51 UTC] fjortiz at comunet dot es

Any updates about this?. I'm showing you the code so fix the issue. Please tell me what else you may need to carry this into the main tree.

TIA

[2007-05-28 15:07 UTC] tony2001@php.net

Unfortunately the extension is currently unmaintained, so I don't think anybody is going to commit your patch in the nearest future.

[2007-07-25 14:28 UTC] ameoba32 at gmail dot com

i found this bug too, so code is like this:

$word = new COM("Word.Application", null, CP_UTF8);
$word->Visible = true;
$doc = $word->Documents->Add();
$word->Selection->TypeText( "UTF8 text here, russian in my case" );

In word appears "Normal text " + garbage.

can this be fixed in CVS ? i have to compile php on win32 by myself ;)

[2007-08-15 08:36 UTC] jani@php.net

Assigned to the maintainer.

[2008-10-22 14:05 UTC] vincent at eal dot com

Any update on this bug? I still have the problem with PHP 5.2.3 and the changelog does not indicate any fix in 5.2.6.

It is very annoying to have a nice patch standing there still with no one to commit it to CVS.

[2009-03-22 10:12 UTC] j dot novak at netsystem dot cz

I see same problem with ADODB.Connection and it's parameters.
When I use:

// init connection to DB
$dbc = new COM('ADODB.Connection',null,CP_UTF8);
$dbc->Open("PROVIDER=MSDASQL;Driver={SQL Server};Server=192.168.210.1;Database=test;UID=test;PWD=test");
// create command
$oCmd = new COM('ADODB.Command',null,CP_UTF8);
$oCmd->ActiveConnection = $dbc; // 
// Table test1 has one row c1 of type nvarchar(200)
$oCmd->CommandText = "INSERT INTO test1(c1) VALUES (?)";
$oCmd->CommandType = 1;
// Some UTF-8 string (length in characters is 15, 24 in bytes)
$val='ABCříšúěďáéóXYZ';
$len=strlen($val);
$p=$oCmd->CreateParameter('name',202,1,$len,$val);
$oCmd->Parameters->Append($p);
$oCmd->Execute();
// ADODB sends to DB nvarchar(24) which is length of string
// in bytes not characters, but data has 15 characters
// in UCS-2. So in database there is correct string, but there is
// some garbage after end of string

[2009-10-20 10:36 UTC] fjortiz at comunet dot es

after 2,5 years, can someone just commit the patch I provided to cvs?. It's annoying to recompile any PHP version that comes out just for the sake of this nonsense. Take into account that any use of PHP+COM+multibyte stuff fails without this silly fix.

Thank you.

[2009-10-20 10:50 UTC] pajoye@php.net

taking the hand on this one.

[2010-08-01 17:45 UTC] ivars_ju at inbox dot lv

Hi,
i still have this bug in PHP Version 5.3.3

Is it so hard to commit this?

[2012-08-08 07:24 UTC] sager at agitos dot de

The patch is still missing in PHP 5.3.15 (I don't know why, the patch is working: thanks to fjortiz!)

So I compiled php_com_dotnet.dll that is compatible to http://windows.php.net/downloads/releases/php-5.3.15-nts-Win32-VC9-x86.msi

Download this compiled dll at http://www.agitos.de/pub/php-5.3.15-com_dotnet-patch.zip

[2012-11-06 17:23 UTC] ddowns at online-access dot com

I went to apply the patch and its already there in 5.3.18. Not sure when this was 
actually solved but the bug status needs updating.

[2014-07-17 07:09 UTC] fjortiz at comunet dot es

Some update: while working with 5.3.28, it seems to work better now, but there is still a length-related bug inside.

While inserting from PHP strings with accents or spanish "ñ" char in MSSQL server through COM driver (utf8 encoded), it seems to insert properly, but if I inspect the last char in the column I find a "\0" at the last position, which makes all my "=" queries fail. 

"\0" is the C language string terminator so there is clearly some kind of memory allocation mismatch in the code. 

Oddly enough queries with "where column='textwithñfails'+char(0)" do work, so I'll use this trick as a workaround meanwhile.

[2017-10-24 07:30 UTC] kalle@php.net

-Status: Assigned +Status: Open -Assigned To: pajoye +Assigned To:

[2020-02-03 08:55 UTC] cmb@php.net

-Status: Open +Status: Duplicate -Assigned To: +Assigned To: cmb

[2020-02-03 08:55 UTC] cmb@php.net

This appears to be a duplicate of bug #66431, which is fixed as of
PHP 5.4.29 and 5.5.13, respectively.

	php.net \| support \| documentation \| report a bug \| advanced search \| search howto \| statistics \| random bug \| login
go to bug id or search bugs for


Copyright © 2001-2025 The PHP Group All rights reserved.	Last updated: Thu Oct 30 02:00:01 2025 UTC