php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #31327 chinese char and word problem
Submitted: 2004-12-28 20:46 UTC Modified: 2015-02-18 01:01 UTC
Votes:46
Avg. Score:4.6 ± 0.8
Reproduced:19 of 20 (95.0%)
Same Version:5 (26.3%)
Same OS:17 (89.5%)
From: vtsuper1 at mail dot hongkong dot com Assigned:
Status: Duplicate Package: COM related
PHP Version: 5CVS-2005-08-30 OS: win32
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If this is not your bug, you can add a comment by following this link.
If this is your bug, but you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: vtsuper1 at mail dot hongkong dot com
New email:
PHP Version: OS:

 

 [2004-12-28 20:46 UTC] vtsuper1 at mail dot hongkong dot com
Description:
------------
I have try this code in 
php 5.0.3 and 4.3
I'm using Chinese winXP and Chinese Office XP
apache 2

both version of php also have the same problem
When I use Chinese character in the string, the output of the word file will have some meaningless spaces after the words I expected(my expected string is ok, no problem).  There are no problem if I using English. And this problem will not occurs in excel.

Reproduce code:
---------------
<?php
// starting word
$word = new COM("word.application") or die("Unable to instantiate Word");
echo "Loaded Word, version {$word->Version}\n";

//bring it to front
$word->Visible = 1;

//open an empty document
$word->Documents->Add();

//do some weird stuff
$word->Selection->TypeText("?A?n??");
$word->Documents[1]->SaveAs("Useless test.doc");

//closing word
$word->Quit();

//free the object
$word = null;
?> 

Expected result:
----------------
?A?n??

Actual result:
--------------
?A?n??____

(there are some spaces occurs, I use the underline to repersent the spaces)

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2005-08-02 23:49 UTC] sniper@php.net
Please try using this CVS snapshot:

  http://snaps.php.net/php5-latest.tar.gz
 
For Windows:
 
  http://snaps.php.net/win32/php5-win32-latest.zip


 [2005-08-03 05:17 UTC] vtsuper1 at mail dot hongkong dot com
Dear sir,

Thanks for your help. I have test it in a php 5.1.X version. But the bug doesn't fixed. The meaningless space change to a square characters only...

Anyway, thanks for your help. (This problem hasn't exist in Excel, it only happened in Word only)

From Victor
 [2005-08-03 07:51 UTC] sniper@php.net
Assigned to the maintainer of COM extension.
 [2005-08-08 11:21 UTC] bmcrowley at lcwarriormail dot com
I am experiencing the same bug with Arabic - I pull a UTF-8 character from a database, and push it into Word with UTF-8 for the characterset. I get meaningless characters (the box) appearing after (to the right of) the arabic text. Also, the arabic text gets mangled - words out of order...

I would try to post the input/output here, but it would be meaningless...
 [2005-08-11 16:11 UTC] wez@php.net
Make sure you're setting up the code page properly.
From the manual:

com.code_page

    It controls the default character set code-page to use when passing strings to and from COM objects. If set to an empty string, PHP will assume that you want CP_ACP, which is the default system ANSI code page.

    If the text in your scripts is encoded using a different encoding/character set by default, setting this directive will save you from having to pass the code page as a parameter to the COM class constructor. Please note that by using this directive (as with any PHP configuration directive), your PHP script becomes less portable; you should use the COM constructor parameter whenever possible.

Consult MSDN for more information on code pages.
 [2005-08-16 04:53 UTC] vtsuper1 at mail dot hongkong dot com
After yours information, I have try to modified my code and make it support the correct code_page. But finially the result is exactly the same with my first script. There are some meaningless square appears after my Chinese word. (Actually I'm quite confused that why only Word has this problem but excel didn't?)

here are some link related to this problem:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/act/htm/actml_ref_scpg.asp
http://bugs.php.net/bug.php?id=31327
http://bugs.php.net/bug.php?id=28117
http://hk.php.net/manual/en/class.com.php

here are my new scripts:
<?php

$code_page=array('950','0','1','2','3','65001');

for ($i=0;$i<count($code_page);$i++){
    // starting word
    $word = new COM("word.application",NULL,$code_page[$i]) or die("Unable to instantiate Word");
    echo "Loaded Word, version {$word->Version}\n";
    
    //bring it to front
    $word->Visible = 1;
    
    //open an empty document
    $word->Documents->Add();
    
    //do some weird stuff
    $word->Selection->TypeText("???Æ?");
    $word->Documents[1]->SaveAs("D:\AppServ\www\word\\".$code_page[$i].".doc");
    
    //closing word
    $word->Quit();
    
    //free the object
    $word = null;
}    
?>
 [2005-09-08 06:05 UTC] vtsuper1 at mail dot hongkong dot com
I have see the lastest version of php 5.0.5 and 5.1
but the change log haven't mention about this bug has been fixed. So would you tell me the developer doesn't think that this is a bug so they will not fix it or they are try to fixing it but didn't finish yet?
 [2007-06-15 07:41 UTC] tonyhook dot su at gmail dot com
When I upgraded PHP to 5.2.0, this problem still exists.
Several "encoding-traslate" codes don't make effect.

Windows Server 2003 R2 SP2
Apache 2
Office 2003 SP2
 [2013-12-13 07:44 UTC] wez@php.net
-Status: Assigned +Status: Open -Assigned To: wez +Assigned To:
 [2015-02-18 00:31 UTC] katszym at gmail dot com
I think I'm experiencing the same problem, but I'm not sure because I can't see the original problem text on this page.  

I have documented my problem on StackOverflow with simple code to reproduce: http://stackoverflow.com/questions/28534219/writing-utf-8-strings-to-word-using-php-com 

Environment: I'm using Windows 7 Enterprise, Apache, MySQL, PHP 5.2.17, and Microsoft Office 2010.

It seems like the string length is not calculated correctly (using bytes and not characters), resulting in a longer string with garbage at the end.
 [2015-02-18 01:01 UTC] requinix@php.net
-Status: Open +Status: Duplicate
 [2015-02-18 01:01 UTC] requinix@php.net
@katszym:
See bug #66431. Looks like it was fixed in PHP 5.4.29.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Apr 20 05:01:27 2024 UTC