|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
[2005-11-20 10:10 UTC] sniper@php.net
|
|||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Sat Oct 25 10:00:01 2025 UTC |
Description: ------------ On an NTFS-formatted drive, Windows XP stores every character of a file name in two bytes. Specifically, the encoding is UTF-16LE. PHP's "readdir" has no problem reading NTFS file names that consist only of characters included in the ISO-8859-1 repertoire. For these characters, the first UTF-16LE byte is the same as the ISO-8859-1 representation, and the second byte is simply 0x00, the null byte, which causes no damage. However, for two-byte characters at higher code points - Chinese characters, for example - "readdir" fails. To illustrate, I created a directory with two files. One of the files is named with Chinese characters; the other is named with ISO-8859-1 compatible characters. The following code: if ($dir = opendir('.\TestFiles')) { while (false !== ($file = readdir($dir))) { echo $file . '<br>'; } } produces this output: . .. ???????.m4a Greensleeves.m4a Clearly, "readdir" does not handle the two-byte Chinese characters correctly. Instead, it returns question marks. This is a big problem if one wants to use a PHP script to back up an NTFS-formatted drive. (And, by the way, using mb_internal_encoding("UTF-16LE") does not solve the problem; this seems to have no impact on "readdir".) Expected result: ---------------- "readdir" should be able to handle UTF-16 (i.e., two-byte) characters Actual result: -------------- "readdir" does not interpret two-byte characters correctly