|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #35300 PHP cannot read file names with multibyte character encoding
Submitted: 2005-11-20 00:50 UTC Modified: 2005-11-20 10:10 UTC
From: dmceo415 at yahoo dot com Assigned:
Status: Not a bug Package: *General Issues
PHP Version: 5.0.5 OS: Windows XP
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
Block user comment
Status: Assign to:
Bug Type:
From: dmceo415 at yahoo dot com
New email:
PHP Version: OS:


 [2005-11-20 00:50 UTC] dmceo415 at yahoo dot com
On an NTFS-formatted drive, Windows XP stores every character of a file name in two bytes. Specifically, the encoding is UTF-16LE.

PHP's "readdir" has no problem reading NTFS file names that consist only of characters included in the ISO-8859-1 repertoire. For these characters, the first UTF-16LE byte is the same as the ISO-8859-1 representation, and the second byte is simply 0x00, the null byte, which causes no damage.

However, for two-byte characters at higher code points - Chinese characters, for example - "readdir" fails.

To illustrate, I created a directory with two files. One of the files is named with Chinese characters; the other is named with ISO-8859-1 compatible characters. The following code:

if ($dir = opendir('.\TestFiles')) {
  while (false !== ($file = readdir($dir))) {
    echo $file . '<br>';

produces this output:


Clearly, "readdir" does not handle the two-byte Chinese characters correctly. Instead, it returns question marks. This is a big problem if one wants to use a PHP script to back up an NTFS-formatted drive.

(And, by the way, using mb_internal_encoding("UTF-16LE") does not solve the problem; this seems to have no impact on "readdir".)

Expected result:
"readdir" should be able to handle UTF-16 (i.e., two-byte) characters

Actual result:
"readdir" does not interpret two-byte characters correctly


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2005-11-20 10:10 UTC]
Please do not submit the same bug more than once. An existing
bug report already describes this very problem. Even if you feel
that your issue is somewhat different, the resolution is likely
to be the same. 

Thank you for your interest in PHP.

There are already feature requests for this. This is not a bug but missing feature. And this missing feature will be in PHP 6.

PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Jul 23 06:01:30 2024 UTC