php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #30195 scandir etc cannot read Chinese file/folder name
Submitted: 2004-09-22 17:05 UTC Modified: 2016-09-14 00:04 UTC
Votes:67
Avg. Score:4.6 ± 0.9
Reproduced:62 of 63 (98.4%)
Same Version:15 (24.2%)
Same OS:38 (61.3%)
From: percy at savant dot us Assigned: ab (profile)
Status: Closed Package: *Directory/Filesystem functions
PHP Version: 5CVS-2004-09-22 (dev) OS: windows xp/2003
Private report: No CVE-ID: None
 [2004-09-22 17:05 UTC] percy at savant dot us
Description:
------------
By using the opendir and readdir or scandir can only read English file/folder name; unrecognised code for Chinese File and folder name.


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2004-11-18 18:50 UTC] phoenix95 at hotmail dot com
I'm not 100% sure we have the same problem... What I experienced is readdir choking on directory names with special characters like an e acute.

I use PHP 4.3.9 in debug mode from the Komodo v.3 IDE on Windows XP Pro SP2 english. This happens with the PEAR package File_Find::mapTreeMultiple().
 [2004-12-27 15:39 UTC] dev at glossword dot info
There is the same problem with php 5.0.2.

How to reproduce:

1. Create file w?اアいנ.txt (urlencoded string is w%d1%88%d8%a7%e3%82%a2%e3%81%84%d7%a0.txt)
2. Read directory, readdir().
3. You'll get wø????.txt (w%c3%b8????.txt) instead of proper name.

With this bug, it is impossible to manage multilingual file names.
 [2005-01-28 18:02 UTC] moleary at preg dot org
This is also the case for Japanese characters on the Windows platform.  A note in the PHP manual says that opendir() uses ISO 8859-1 by default on WIndows installations.

REPRO:
1) Install East Asian language supprto to XP through the Regioanl languages control panel
2) add a file with Japanese characters as the name (I copied a couple strings from the asahi.com web site)

ACTUAL:
1) These same characters as text will be presented properly with the proper code page and/or header encoding.
2) The characters are not correctly parsed when looping through the contents of an opendir() on the directory in which you placed the file you created under the repro steps.

EXPECTED:
Ability to define charset for opendir(), or at very least use a more standard UTF-8 instead of ISO 8859-1 so that Asian named files can have their proper names returned by opendir()
 [2005-02-17 15:20 UTC] moriyoshi@php.net
You might have shown strings of a native encoding as 
UTF-8 in your browser, most likely because of wrong 
Content-Type.

Try putting one of the following at the top of your 
script and let's see what'll happen:

CP936 (Simplified Chinese):

  header('Content-Type', 'text/html; charset=GB2312');

CP949 (Korean):

  header('Content-Type', 'text/html; charset=EUC-KR');

CP950 (Traditional Chinese):

  header('Content-Type', 'text/html; charset=BIG5');

CP932 (Japanese):

  header('Content-Type', 'text/html; 
charset=Shift_JIS');


 [2005-02-17 15:22 UTC] moriyoshi@php.net
Note that all of these are PHP code, so paste it within 
<?php ?>.

 [2005-02-25 01:00 UTC] php-bugs at lists dot php dot net
No feedback was provided for this bug for over a week, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
 [2006-07-10 10:02 UTC] gandhavallakiran at yahoo dot co dot in
hi i have used the babel class in my coding of php. but it could not read the characters of china and japan i.e. special characters. it is displaying the blank space instead of china or  japan text. could you help me in this reacord how to display china characters in php. please it is very urgent kindly help me.
 [2007-03-31 23:30 UTC] missingno at ifrance dot com
Same problem here.

On WinXP with PHP 5.2.0, using iso-8859-1 as charset for the system (though the filesystem uses utf-8 for folders/files names).

I need to access folders whose names are encoded using UTF-8.
readdir/scandir won't allow me to do so (returning '?' for characters outside the system charset).


The page is served like this:
header('Content-Type: text/html; charset=utf-8');
So the browser really isn't at fault.
Serving the document with a more specific charset is not an option since I have to display texts in many different languages on the page.

As moleary at preg dot org suggested, it would be really nice to have an option to force PHP to use a certain encoding while accessing the filesystem. Or maybe, make it so that it uses the same encoding as the filesystem instead of defaulting to iso-8859-1...
 [2010-08-29 18:13 UTC] onekamil at gmail dot com
Hi, have the same problem and my solution is: using mb_convert_encoding.

$open = opendir($path);
foreach( $open as $value ) 
{
   $value = mb_convert_encoding($value, mb_detect_order($value), "UTF-8");
}

If saving file to folder using urlencode. To view using urldecode.
 [2011-06-07 00:08 UTC] spidgorny at gmail dot com
Same problem (files with question marks) with Russian files in PHP 5.3.0 on Windows 7.
mb_convert_encoding() can't help converting question marks.
Maybe DirectoryIterator will help.
 [2011-11-04 12:45 UTC] martin dot keckeis1 at gmail dot com
Its a problem in PHP general.

This could have been solved with PHP 6.0 (general multibyte support).

There are some workarounds in the internet with the windows console.
 [2014-09-20 16:36 UTC] bsy at breadlab dot net
All solutions above didn't solve the problem in php5.2.12.
I'm using Korean Windows7 and failed to print japanese characters.
But my system could print Korean, special letters(like ☆), and the characters that are used in both Korean and Japanese.

For example, 靑 is used in Korean and 青 is used in Japanese.
Then my system prints only the former.
 [2016-09-14 00:04 UTC] ab@php.net
-Status: No Feedback +Status: Closed -Assigned To: +Assigned To: ab
 [2016-09-14 00:04 UTC] ab@php.net
Fixed in PHP 7.1. Please read UPGRADING.

thanks.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Mar 29 00:01:28 2024 UTC