|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
[2011-02-15 16:51 UTC] schmale at froglogic dot com
Description:
------------
Notice: This problem does ONLY affect the CLI interpreter, NOT the CGI.
Using dir('path/to/dir'), the read() method does not return UTF-8, if the directory contains e.g. umlauts (ä, ö, ü). I tested this on Linux and Windows, both CGI and CLI, and the problem does only occur with Windows/CLI.
Test script:
---------------
$path = 'path/to/directory/which/contains/umlauts';
$directory = dir($path);
while (false !== ($content = $directory->read())) {
if (mb_check_encoding($content, 'UTF-8') === false) {
fprintf(STDERR, 'Returned non-utf-8 (%s)', $content);
}
}
Expected result:
----------------
The expected result, of course, was that the return value of read is always encoded in UTF-8, i.e. no messages are print, when we run the script.
Actual result:
--------------
If a subdirectory contains umlauts (or I guess any non-unicode character), a message is print, i.e. the return value is not encoded in UTF-8.
PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
|
|||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Sat Nov 08 23:00:01 2025 UTC |
| and the problem does only occur with Windows/CLI. I have no difference between CGI and CLI (both executed from the shell) Of course, something is courious: <?php $directory = dir(getenv('USERPROFILE')); while (false !== ($content = $directory->read())) { if (mb_check_encoding($content, 'UTF-8') === false) { printf('Returned non-utf-8 (%s)', $content); printf(" Encoding: %s\r\n", mb_detect_encoding($content)); } } ?> And the output is: Returned non-utf-8 (Startmenü) Encoding: UTF-8 Regards, Carsten> Windows supports UCS-2 internally via the wild char APIs. I now... I'm just wondering why: "mb_detect_encoding($content)" is returing 'UTF-8' and "mb_check_encoding($content, 'UTF-8')" is returning FALSE? Also I think there is another problem: | C:\Users\Carsten Wiedmann>php -r "echo realpath('.');" | C:\Users\Carsten Wiedmann | C:\Users\Carsten Wiedmann>cd Startmenü | | C:\Users\Carsten Wiedmann\Startmenü>php -r "echo realpath('.');" | | C:\Users\Carsten Wiedmann\Startmenü> Regards, Carsten> PHP relies on the ANSI APIs and the encoding is then the runtime encoding > (whatever is set for the running process or system wild). "Startmenü" can be accessed without any problems thought the ANSI API. An "ü" exists in CP437, CP850 and CP1252 (just use the chcp command), thus I'm not talking about unicode. You can also test this with a small C-Code. So why is | php -r "echo realpath('.');" returning false?