php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #35300 PHP cannot read file names with multibyte character encoding
Submitted: 2005-11-20 00:50 UTC Modified: 2005-11-20 10:10 UTC
From: dmceo415 at yahoo dot com Assigned:
Status: Not a bug Package: *General Issues
PHP Version: 5.0.5 OS: Windows XP
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: dmceo415 at yahoo dot com
New email:
PHP Version: OS:

 

 [2005-11-20 00:50 UTC] dmceo415 at yahoo dot com
Description:
------------
On an NTFS-formatted drive, Windows XP stores every character of a file name in two bytes. Specifically, the encoding is UTF-16LE.

PHP's "readdir" has no problem reading NTFS file names that consist only of characters included in the ISO-8859-1 repertoire. For these characters, the first UTF-16LE byte is the same as the ISO-8859-1 representation, and the second byte is simply 0x00, the null byte, which causes no damage.

However, for two-byte characters at higher code points - Chinese characters, for example - "readdir" fails.

To illustrate, I created a directory with two files. One of the files is named with Chinese characters; the other is named with ISO-8859-1 compatible characters. The following code:

if ($dir = opendir('.\TestFiles')) {
  while (false !== ($file = readdir($dir))) {
    echo $file . '<br>';
  }
}

produces this output:

.
..
???????.m4a
Greensleeves.m4a

Clearly, "readdir" does not handle the two-byte Chinese characters correctly. Instead, it returns question marks. This is a big problem if one wants to use a PHP script to back up an NTFS-formatted drive.

(And, by the way, using mb_internal_encoding("UTF-16LE") does not solve the problem; this seems to have no impact on "readdir".)

Expected result:
----------------
"readdir" should be able to handle UTF-16 (i.e., two-byte) characters

Actual result:
--------------
"readdir" does not interpret two-byte characters correctly

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2005-11-20 10:10 UTC] sniper@php.net
Please do not submit the same bug more than once. An existing
bug report already describes this very problem. Even if you feel
that your issue is somewhat different, the resolution is likely
to be the same. 

Thank you for your interest in PHP.

There are already feature requests for this. This is not a bug but missing feature. And this missing feature will be in PHP 6.

 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Nov 23 11:01:28 2024 UTC