php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #54569 echo,proc_open,... issues with UTF-8 terminal
Submitted: 2011-04-19 17:56 UTC Modified: 2011-04-19 23:24 UTC
From: vmiszczak at ankama dot com Assigned:
Status: Not a bug Package: *General Issues
PHP Version: 5.3.6 OS: Windows
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: vmiszczak at ankama dot com
New email:
PHP Version: OS:

 

 [2011-04-19 17:56 UTC] vmiszczak at ankama dot com
Description:
------------
Those functions (at least) do strange things when using an UTF-8 terminal under 
Windows.
For instance, doing a php 'echo' on a UTF-8 string does not output the string!
Printing the same string with fwrite(STDOUT,$string) works.

More problematic : parsing UTF-8 data and trying to execute a program (I've tried 
popen,exec,proc_open,passthru) with those data as program arguments make the 
program to not understand the decoding that should be used.
Writing the script output to a batch file and executing the batch works. 

Test script:
---------------
Launch a cmd.exe.
Execute chcp 65001 to get the terminal use UTF-8 (make sure your font support this, use Lucida console for instance).

Launch this script :

<?php
$data="ï";
fwrite(STDOUT,$data."\n");
echo $data."\n";
?>


Expected result:
----------------
I'm expecting UTF-8 data to be shown correctly and program execution string passed 
to exec() and so use correct string decoding.

Actual result:
--------------
ï

��

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2011-04-19 18:10 UTC] pajoye@php.net
-Status: Open +Status: Bogus
 [2011-04-19 18:10 UTC] pajoye@php.net
Windows has no idea about UTF-8, especially not in console mode. But that does not 
mean the data you are echo'ing is not correct UTF-8.
 [2011-04-19 18:32 UTC] vmiszczak at ankama dot com
"Windows has no idea about UTF-8"
Are you serious?!!
C function WideCharToMultiByte(CP_UTF8,NULL,native,-
1,encoded,sizeof(encoded),NULL,NULL) does encode as UTF-8.

printf("%s",encoded) does output correct things within a rightly configured 
terminal, just like you do under Linux.

PHP does not.
 [2011-04-19 18:41 UTC] pajoye@php.net
The operating system runtime has no idea about UFT-8, in its shell, its file name 
encoding, etc.

Its internal APIs obviously provide UTF-8 conversion function (or other).

Secondly, as I said earlier you have to do the conversion manually as PHP does not 
use the WildChar APIs.
 [2011-04-19 23:24 UTC] vmiszczak at ankama dot com
Ok I think I understand what you mean. Windows does not handle strings as UTF-8 
internally. I'm OK with that, so does Linux. It really does not matter as long 
you use correct encoding/decoding.

Under Linux, I can name a file eï.php and call "php eï.php". It will execute 
that file (assuming locales are UTF-8).
Here is an illustration:
# echo "<?php echo \"ï\n\";?>" > eï.php
# php eï.php 
ï
# locale
LANG=fr_FR.UTF-8
LC_CTYPE="fr_FR.UTF-8"
LC_NUMERIC="fr_FR.UTF-8"
LC_TIME="fr_FR.UTF-8"
LC_COLLATE="fr_FR.UTF-8"
LC_MONETARY="fr_FR.UTF-8"
LC_MESSAGES="fr_FR.UTF-8"
LC_PAPER="fr_FR.UTF-8"
LC_NAME="fr_FR.UTF-8"
LC_ADDRESS="fr_FR.UTF-8"
LC_TELEPHONE="fr_FR.UTF-8"
LC_MEASUREMENT="fr_FR.UTF-8"
LC_IDENTIFICATION="fr_FR.UTF-8"
LC_ALL=

Doing the same under Windows (having the cmd.exe use chcp 65001):
c:\>chcp 65001
Active code page: 65001

c:\>type eï.php
<?php echo "ï\n";?>

c:\php eï.php
��

The "type" command is able to decode characters. Windows terminal handles UTF-8. 
You just have to activate it. Just like Linux.

If I change the script to :
c:\>type eï.php
<?php echo "A ï in a string\n";?>

c:\php eï.php
A ï in a string

So it seems it fails under certain circumstances. 

This is a simple example.
My goal is to call (using proc_open() or similar) a program that takes unicode 
parameters. I cannot convert those parameters because they are file paths that 
contain multilingual characters. A mix of several languages in a string that 
makes impossible to code that on 1 byte. If I make an UTF-8 batch and call it, 
no problem, the program handles unicode and does its job. Using proc_open() or 
similar, characters are decoded the wrong way. Generating the script to a file 
and launching it from filesystem is not acceptable because the PHP program I'm 
working on needs performance.

I'm working on interresting articles like 
http://stackoverflow.com/questions/2706097/how-to-do-proper-unicode-and-ansi-
output-redirection-on-cmd-exe. It could be a Windows bug.

I'd really like to work on Linux, but the project launches Windows tools :/
 [2011-11-30 22:36 UTC] tomek at dma dot net dot pl
<?php
$data="ąęćńś";
fwrite(STDOUT,$data."\n");

?>

this not works!
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Wed May 21 10:02:02 2025 UTC