php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #54977 UTF-8 files and folder are not shown
Submitted: 2011-06-02 14:15 UTC Modified: 2011-06-02 16:11 UTC
From: andrea dot rizzini at slidepath dot com Assigned:
Status: Duplicate Package: Filesystem function related
PHP Version: 5.2.17 OS: Window 7
Private report: No CVE-ID: None
 [2011-06-02 14:15 UTC] andrea dot rizzini at slidepath dot com
Description:
------------
Listing the content of a directory using the standard php functions does not return any files or folders which have UTF-8 charcters.

Windows 7 Supports Unicode "UCS-2LE" so you can create any folder / files like the following: (王 (king in Japanese) and 汚れて掘る (dirty dog on Japanese)).

If you list the directory where you have these files you will get ?????, ?????.

After a bit of investigation I've noticed that PHP internally treats any string literal in ISO8859-1. 

I've tried to change the default encoding to UTF-8 in the php.ini but IT DOES NOT WORK.



Test script:
---------------
Create several UTF-8 folders using chinese or japanese character.

List them using the standard php function opendir like:

$myDirectory = opendir(".");
// get each entry
while($entryName = readdir($myDirectory)) {
	$dirArray[] = $entryName;
}
// close directory
closedir($myDirectory);
//	count elements in array
$indexCount	= count($dirArray);
Print ("$indexCount files<br>\n");


Notice that the UTF-8 encoded files are returned as ???????



Expected result:
----------------
It should return the full list of directoried including directories with UTF-8 encoded name

Actual result:
--------------
It only return folders which have a characters contained in IS08859-1 .

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2011-06-02 14:17 UTC] pajoye@php.net
-Status: Open +Status: Duplicate
 [2011-06-02 14:17 UTC] pajoye@php.net
There is already a request about supporting unicode FS on Windows.
 [2011-06-02 16:11 UTC] andrea dot rizzini at slidepath dot com
Well as you know window 7 is the base of window 2008 R1 and window 2008 R2 servers while window 2003 server is based on the vista. Both of these systems use Unicode - UCS-2LE character encoding. 

Any ideas on when it will be fixed? 

Could you suggest a workaround or a temporary fix?

I'm building a PHP extension with BOOST and WString in order to amend to this shortfall. Is there any suggestion to can tell me? 

You help is very appreciated.

Regards

Andrea
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Nov 23 11:01:28 2024 UTC