php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #75063 Main CWD initialized with wrong codepage
Submitted: 2017-08-11 10:14 UTC Modified: 2017-08-14 14:37 UTC
From: anrdaemon at freemail dot ru Assigned: ab
Status: Closed Package: Filesystem function related
PHP Version: 7.1.8 OS: Windows
Private report: No CVE-ID:
 [2017-08-11 10:14 UTC] anrdaemon at freemail dot ru
Description:
------------
Many filesystem functions that read, write or otherwise deal with file names are unable to work with multibyte encoded filenames, such as when internal_encoding is equal to UTF-8.

Even the example in https://php.net/opendir doesn't work, if a path contains multibyte sequences.

Test script:
---------------
<?php

print ini_get("internal_encoding") . "\n";
print ini_get("default_charset") . "\n";
foreach(["test", "тест"] as $fn)
  file_put_contents("$fn.txt", "");

print_r(glob("*.txt"));


Expected result:
----------------
$ php -nf ./xx.php

UTF-8
Array
(
    [0] => test.txt
    [1] => тест.txt
)

Actual result:
--------------
$ php -nf ./xx.php

UTF-8
Array
(
    [0] => test.txt
)

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2017-08-11 10:40 UTC] requinix@php.net
-Status: Open +Status: Feedback
 [2017-08-11 10:40 UTC] requinix@php.net
Works for me. Are you sure you saved the file in UTF-8 format?
 [2017-08-11 12:04 UTC] anrdaemon at freemail dot ru
Weird. Works for me if only filename contains multibyte characters, but breaks when path contains them also, but again, only for multibyte file names.
Can you please try it with parent directory name contaning multibyte characters?

Another interesting finding: getcwd() returns natively encoded filename.

$ php.exe -nf opendir.php 
<?php
print file_get_contents(__FILE__) . "-- \n";
print ini_get("internal_encoding") . "\n";
print ini_get("default_charset") . "\n";
print getcwd() . "\n";
print iconv('CP1251', 'UTF-8', getcwd()) . "\n";
foreach(["test", "тест"] as $fn)
  file_put_contents("$fn.txt", "");

if ($dh = opendir(getcwd())) {
    while (($file = readdir($dh)) !== false) {
        echo "filename: $file : filetype: " . filetype($file) . "\n";
    }
    closedir($dh);
}
--

UTF-8
C:\dev\temp\

C:\dev\temp\тест
filename: . : filetype: dir
filename: .. : filetype: dir
filename: opendir.php : filetype: file
filename: test.txt : filetype: file
filename: UTF-8.php : filetype: file

Warning: filetype(): Lstat failed for тест.txt in C:\dev\temp\тест\opendir.php on line 12
filename: тест.txt : filetype:
 [2017-08-11 13:02 UTC] requinix@php.net
-Status: Feedback +Status: Open
 [2017-08-11 13:02 UTC] requinix@php.net
I can get the correct result and various different incorrect results, but not your result.

You're running PHP 7.1.8 and not an earlier version? What does echo `chcp` show?
Have you looked at the comments in other similar bug reports (like #74589, #72555, and your earlier #73716) for anything that appears relevant?
 [2017-08-11 15:19 UTC] anrdaemon at freemail dot ru
The bug #72555 (parent of those mentioned) is about dealing with console codepage.
The present issue is strictly internal.
Here, I modified the test script a little and supplied the example output from running it with "php -nf ..."

http://www.rootdir.org/upload/Bugs/PHP-75063/php-75063.zip
 [2017-08-12 14:30 UTC] ajf@php.net
-Assigned To: +Assigned To: ab
 [2017-08-12 17:51 UTC] ab@php.net
-Status: Assigned +Status: Feedback
 [2017-08-12 17:51 UTC] ab@php.net
@arndaemon which exact build did you use?

From the supplied archive - the folder in the root is shown as "ΓÑßΓ-75063". It might be either a packaging error, or possibly something on the HD that could be a cause. If latter, which kind of FS is that, like local HD, NTFS, Samba, etc.?

Thanks.
 [2017-08-13 13:53 UTC] anrdaemon at freemail dot ru
My apology, it's a zip issue. Totally forgot that there's no standard way to encode file names in zip archives.

The original name was "тест-75063".
I reuploaded the archive for your convenience. http://www.rootdir.org/upload/Bugs/PHP-75063/php-75063.tar.xz

The build I'm using is the 7.1.8 64-bit TS release from http://windows.php.net/download#php-7.1

The "output.txt" contains essential part of the phpinfo() (minus credits and environment dump).
If there's anything in the environment, that could make a difference on the results, please let me know and I'll fill the gaps.
 [2017-08-13 22:54 UTC] ab@php.net
Automatic comment on behalf of ab
Revision: http://git.php.net/?p=php-src.git;a=commit;h=3069ad8dd1400f19a28230d3028048a27d07ce8d
Log: Fixed bug #75063
 [2017-08-13 22:54 UTC] ab@php.net
-Status: Feedback +Status: Closed
 [2017-08-14 10:21 UTC] ab@php.net
-Summary: Many filesystem-related functions do not work with multibyte file names +Summary: Main CWD initialized with wrong codepage
 [2017-08-14 10:21 UTC] ab@php.net
@arndaemon fixed in dev, the latest snapshots should reflect the fixed state.

Thanks.
 [2017-08-14 14:01 UTC] anrdaemon at freemail dot ru
Thanks, all seems to be working as expected.
At least filetype, getcwd, glob and simplexml_load_file all check out correctly.
 [2017-08-14 14:37 UTC] ab@php.net
Great, thanks for the check. There are discrepancies between TS/NTS by nature, as this case reveals. UTF-8 is still the best option when it comes to the streams I/O, whereby for setups ASCII might be still a safer choice. The fix will likely land in the upcoming RC.

Thanks.
 [2017-08-14 15:32 UTC] anrdaemon at freemail dot ru
For Windows, I'm only concerned about TS variant, and checking it specifically.
For *NIX, the situation is more consistent by nature and I'm not worried… much.
 
PHP Copyright © 2001-2017 The PHP Group
All rights reserved.
Last updated: Tue Aug 29 15:01:52 2017 UTC