php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #72555 CLI output(japanese) on Windows
Submitted: 2016-07-06 17:12 UTC Modified: 2016-12-18 00:15 UTC
Votes:2
Avg. Score:3.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:0 (0.0%)
Same OS:0 (0.0%)
From: algo13 dot net at gmail dot com Assigned: ab (profile)
Status: Closed Package: CGI/CLI related
PHP Version: Next Minor Version OS: Windows8.1(Japanese editon)
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: algo13 dot net at gmail dot com
New email:
PHP Version: OS:

 

 [2016-07-06 17:12 UTC] algo13 dot net at gmail dot com
Description:
------------
echo '日本語'; (cp932) with ini_set('default_charset', 'CP932');
=> result 日日本本語語

Test script:
---------------
C:\php-7.1.0alpha2-Win32-VC14-x64>chcp
現在のコード ページ: 932

C:\php-7.1.0alpha2-Win32-VC14-x64>type php.ini | findstr "^default_charset"
default_charset = "UTF-8"

C:\php-7.1.0alpha2-Win32-VC14-x64>type iniset.php
<?php
ini_set('default_charset', 'CP932');
echo '日本語', PHP_EOL;
/* vim:set fenc=cp932 ff=dos: */

C:\php-7.1.0alpha2-Win32-VC14-x64>php iniset.php
日日本本語語

C:\php-7.1.0alpha2-Win32-VC14-x64>



Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2016-07-06 18:07 UTC] ab@php.net
-Assigned To: +Assigned To: ab
 [2016-07-11 13:14 UTC] ab@php.net
-Status: Assigned +Status: Verified
 [2016-07-11 13:14 UTC] ab@php.net
I've investigated so far, and this seems a console issue with the particular codepage. When using php.ini or even "-d default_charset=..." directly, it is fine. A redirection to another program or into a file is also correct.

I currently see no API that would really affect the behavior. Seems when switching the codepage, the console tries to remap chars to another codepage. A simple proof to this theory i made - add the line "fscanf(STDIN, "%s", $dummy);" at the end of the reproduce code. As long as PHP didn't switch back to UTF-8, the output looks correct. 

I suspect, that a similar issue is present on other double byte codepage. With a single byte codepage like 1252, switching codepage on runtime is of a non issue. Currently I can only recommend to use php.ini or -d on console don't switch it, already added a note to UPGRADING. I'm going to keep looking for a solution and keep this ticket open.

Thanks.
 [2016-12-14 02:33 UTC] ab@php.net
-Status: Verified +Status: Feedback
 [2016-12-14 02:33 UTC] ab@php.net
@algo13, a patch landed in 7.1, which implements the separation of the in/out codepage for CLI. Could you please check the latest snapshots? Basically, all that should be needed is setting output_encoding=cp932, while default encoding can stay UTF-8. No default behavior change, though.

Thanks.
 [2016-12-16 22:11 UTC] ab@php.net
@algo13, I come to the point now. It is best described here

https://github.com/Maximus5/ConEmu/issues/739#issuecomment-228670160

[quote]
DBCS versions of Windows works absolutely different than non-DBCS.
When you run application on DBCS Windows and use double (four) byte codepage (like 932) each CJK takes real two cells. .... the console doubles each CJK (first will have COMMON_LVB_LEADING_BYTE and second - COMMON_LVB_TRAILING_BYTE flag) and you have this glyph in TWO cells, otherwise [A] functions will fail to read 932 codepage!

The only exception is codepage 65001. It uses one unicode (wchar_t) real cell.
[/quote]

This is exactly what happens. One can observe, that when using chcp on a system with default cp 932, that the screen gets cleared every time it is done. It is not for nothing. Of course, the display is not cleared, when ini_set() is called, so it causes the glyph duplication effect. With the new approach now, UTF-8 will be still default, but if you set input_encoding and output_encoding to cp932, the console codepage won't be indeed changed. So you can work with UTF-8 internally, while having 932 on console. Note, that you'll need to care about the automatic I/O conversion, though. 

With this, the issue looks like solved to me.

Thanks.
 [2016-12-18 00:15 UTC] ab@php.net
-Status: Feedback +Status: Closed
 [2016-12-18 00:15 UTC] ab@php.net
The fix for this bug has been committed.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.

 For Windows:

http://windows.php.net/snapshots/
 
Thank you for the report, and for helping us make PHP better.

The UPGRADING document was updated, please also check the snapshots.

Thanks.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Mar 19 08:01:29 2024 UTC