php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #49475 readdir fails for Unicode filenames
Submitted: 2009-09-05 21:57 UTC Modified: 2009-09-05 23:38 UTC
From: elmue at gmx dot de Assigned:
Status: Not a bug Package: *Unicode Issues
PHP Version: 6SVN-2009-09-05 (snap) OS: Windows
Private report: No CVE-ID: None
 [2009-09-05 21:57 UTC] elmue at gmx dot de
Description:
------------
Hello

I have PHP6 - VC6 compiled on 3. Sept 2009.

How to reproduce the bug:

Create a file:
C:\Temp\T?st.txt
(note the accent on the e)

Execute the code below.

What happens is the warning:
"Could not convert binary string to Unicode string (converter UTF-8 failed on bytes (0xE9) at offset 1)"

(E9 is the Ascii code of the '?' character)

and an empty string is returned in $File.

If the filename contains russian or greek characters it is even worse:
In this case no warning is displayed and the filename is returned as "??????.txt"

This warning message is nonsense.
All Windows Operating Systems store Filenames in Unicode except Windows 95,98,ME which are out of date.

So there is no reason to put the filename into an UTF-8 converter as the warning says.
There is no conversion required on Windows if the correct API is used. Windows offers the old FindFirstFileA(...) API and the Unicode FindFirstFileW(..) API. I hope that the PHP programmers did not make the error to use the Ansii versions which are Codepage dependent and produce a !lot! of problems.

The Wide API like FindFirstFileW(...) returns ALL filenames directly in Unicode. There is NO CONVERSION required on Windows and there is NO UTF-8 converter required.

I also played around with different settings for
ini_set("unicode.filesystem_encoding", "...")

but the error stays the same.
There is design error deep in the code.

Elm?


Reproduce code:
---------------
<?php
$hDir = opendir("C:\\Temp");
while ($hDir) 
{
    $File = readdir($hDir);       // <--- produces warning
    if ($File === false) break;
    echo "File=$File<br>";
}
?>

Expected result:
----------------
correct filename
no warning

Actual result:
--------------
the file is returned as empty string or as "?????.txt"

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-09-05 22:05 UTC] pajoye@php.net
There is already a feature request about that. In the meantime you can use mbstring to convert to your local encoding (check your prefs and verify which encoding you have to use). But real unicode support for file operations will not be available soon, early next year at the soonest.

 [2009-09-05 22:41 UTC] elmue at gmx dot de
Hello

> In the meantime you can use mbstring to convert to your local encoding

I suppose that you did not read what I explained.

How do I use mbstring to convert anything if an empty string is returned?
mbstring can only convert an empty string into another empty sting!
This would not be very usefull!
And mbstring also can't convert "??????.txt" into anything usefull.

The code that I posted works fine on PHP 5 (at least if I don't use greek or russian characters) but on PHP 6 it is broken.

There is no way!
On PHP 6 you can't currently work with filenames that have an accent or umlaut. Its worse than PHP 5.

Elm?
 [2009-09-05 23:38 UTC] pajoye@php.net
That's the same, unless we change file ops to return binary string for the string represented a path.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Mar 19 10:01:30 2024 UTC