php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #54569 echo,proc_open,... issues with UTF-8 terminal
Submitted: 2011-04-19 17:56 UTC Modified: 2011-04-19 23:24 UTC
From: vmiszczak at ankama dot com Assigned:
Status: Not a bug Package: *General Issues
PHP Version: 5.3.6 OS: Windows
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: vmiszczak at ankama dot com
New email:
PHP Version: OS:

 

 [2011-04-19 17:56 UTC] vmiszczak at ankama dot com
Description:
------------
Those functions (at least) do strange things when using an UTF-8 terminal under 
Windows.
For instance, doing a php 'echo' on a UTF-8 string does not output the string!
Printing the same string with fwrite(STDOUT,$string) works.

More problematic : parsing UTF-8 data and trying to execute a program (I've tried 
popen,exec,proc_open,passthru) with those data as program arguments make the 
program to not understand the decoding that should be used.
Writing the script output to a batch file and executing the batch works. 

Test script:
---------------
Launch a cmd.exe.
Execute chcp 65001 to get the terminal use UTF-8 (make sure your font support this, use Lucida console for instance).

Launch this script :

<?php
$data="ï";
fwrite(STDOUT,$data."\n");
echo $data."\n";
?>


Expected result:
----------------
I'm expecting UTF-8 data to be shown correctly and program execution string passed 
to exec() and so use correct string decoding.

Actual result:
--------------
ï

��

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2011-04-19 18:10 UTC] pajoye@php.net
-Status: Open +Status: Bogus
 [2011-04-19 18:10 UTC] pajoye@php.net
Windows has no idea about UTF-8, especially not in console mode. But that does not 
mean the data you are echo'ing is not correct UTF-8.
 [2011-04-19 18:32 UTC] vmiszczak at ankama dot com
"Windows has no idea about UTF-8"
Are you serious?!!
C function WideCharToMultiByte(CP_UTF8,NULL,native,-
1,encoded,sizeof(encoded),NULL,NULL) does encode as UTF-8.

printf("%s",encoded) does output correct things within a rightly configured 
terminal, just like you do under Linux.

PHP does not.
 [2011-04-19 18:41 UTC] pajoye@php.net
The operating system runtime has no idea about UFT-8, in its shell, its file name 
encoding, etc.

Its internal APIs obviously provide UTF-8 conversion function (or other).

Secondly, as I said earlier you have to do the conversion manually as PHP does not 
use the WildChar APIs.
 [2011-04-19 23:24 UTC] vmiszczak at ankama dot com
Ok I think I understand what you mean. Windows does not handle strings as UTF-8 
internally. I'm OK with that, so does Linux. It really does not matter as long 
you use correct encoding/decoding.

Under Linux, I can name a file eï.php and call "php eï.php". It will execute 
that file (assuming locales are UTF-8).
Here is an illustration:
# echo "<?php echo \"ï\n\";?>" > eï.php
# php eï.php 
ï
# locale
LANG=fr_FR.UTF-8
LC_CTYPE="fr_FR.UTF-8"
LC_NUMERIC="fr_FR.UTF-8"
LC_TIME="fr_FR.UTF-8"
LC_COLLATE="fr_FR.UTF-8"
LC_MONETARY="fr_FR.UTF-8"
LC_MESSAGES="fr_FR.UTF-8"
LC_PAPER="fr_FR.UTF-8"
LC_NAME="fr_FR.UTF-8"
LC_ADDRESS="fr_FR.UTF-8"
LC_TELEPHONE="fr_FR.UTF-8"
LC_MEASUREMENT="fr_FR.UTF-8"
LC_IDENTIFICATION="fr_FR.UTF-8"
LC_ALL=

Doing the same under Windows (having the cmd.exe use chcp 65001):
c:\>chcp 65001
Active code page: 65001

c:\>type eï.php
<?php echo "ï\n";?>

c:\php eï.php
��

The "type" command is able to decode characters. Windows terminal handles UTF-8. 
You just have to activate it. Just like Linux.

If I change the script to :
c:\>type eï.php
<?php echo "A ï in a string\n";?>

c:\php eï.php
A ï in a string

So it seems it fails under certain circumstances. 

This is a simple example.
My goal is to call (using proc_open() or similar) a program that takes unicode 
parameters. I cannot convert those parameters because they are file paths that 
contain multilingual characters. A mix of several languages in a string that 
makes impossible to code that on 1 byte. If I make an UTF-8 batch and call it, 
no problem, the program handles unicode and does its job. Using proc_open() or 
similar, characters are decoded the wrong way. Generating the script to a file 
and launching it from filesystem is not acceptable because the PHP program I'm 
working on needs performance.

I'm working on interresting articles like 
http://stackoverflow.com/questions/2706097/how-to-do-proper-unicode-and-ansi-
output-redirection-on-cmd-exe. It could be a Windows bug.

I'd really like to work on Linux, but the project launches Windows tools :/
 [2011-11-30 22:36 UTC] tomek at dma dot net dot pl
<?php
$data="ąęćńś";
fwrite(STDOUT,$data."\n");

?>

this not works!
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Wed May 21 10:02:02 2025 UTC