php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #79455 scandir() mishandles UTF-8 character(s)
Submitted: 2020-04-07 14:04 UTC Modified: 2020-04-16 10:25 UTC
From: pguest at meaa dot mea dot com Assigned: cmb (profile)
Status: Not a bug Package: Directory function related
PHP Version: 7.4.4 OS: Windows 10
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: pguest at meaa dot mea dot com
New email:
PHP Version: OS:

 

 [2020-04-07 14:04 UTC] pguest at meaa dot mea dot com
Description:
------------
A Windows filesystem directory containing the character:
á, 225, 0xE1

is returned from scandir as two characters:
Ã, 195, 0xC3
¡, 161, 0xA1

In otherwords:
'Arquivo dos Gráficos'
becomes:
'Arquivo dos Gráficos'

Test script:
---------------
<?php

$rootdir = 'C:/temp/';
$portuguese_string = 'Arquivo dos Gráficos';
$newdir = $rootdir . $portuguese_string;
if (is_dir($newdir))
{
   rmdir ($newdir);
}
if (is_dir($rootdir))
{
   rmdir ($rootdir);
}

mkdir($rootdir);
mkdir($newdir);

echo sprintf("'%s' encoding: %s\n", $portuguese_string, mb_detect_encoding($portuguese_string));
$scandir_return = scandir($rootdir);
echo "scandir() returns: " . print_r($scandir_return, true);
echo sprintf("'%s' encoding: %s\n", $scandir_return[2], mb_detect_encoding($scandir_return[2]));

?>


Expected result:
----------------
The expectation is that scandir() should read back the directory string with the same UTF-8 characters involved in its creation.

Actual result:
--------------
'Arquivo dos Gráficos'
becomes:
'Arquivo dos Gráficos'

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2020-04-07 14:18 UTC] salathe@php.net
-Package: *Directory Services problems +Package: Directory function related
 [2020-04-07 14:47 UTC] cmb@php.net
-Status: Open +Status: Feedback -Assigned To: +Assigned To: cmb
 [2020-04-07 14:47 UTC] cmb@php.net
I cannot reproduce this.  Please provide the output of
the following script:

<?php
var_dump(
    ini_get('internal_encoding'),
    ini_get('default_charset'),
    ini_get('zend.multibyte'),
    sapi_windows_cp_get('ansi'),
    sapi_windows_cp_get('oem')
);
?>
 [2020-04-07 14:53 UTC] pguest at meaa dot mea dot com
-Status: Feedback +Status: Assigned
 [2020-04-07 14:53 UTC] pguest at meaa dot mea dot com
cmb,
Here is output from:
<?php
var_dump(
    ini_get('internal_encoding'),
    ini_get('default_charset'),
    ini_get('zend.multibyte'),
    sapi_windows_cp_get('ansi'),
    sapi_windows_cp_get('oem')
);
?>

C:\Workspace\PDT\__CORE\php_cmb.php:7:
string(0) ""
C:\Workspace\PDT\__CORE\php_cmb.php:7:
string(5) "UTF-8"
C:\Workspace\PDT\__CORE\php_cmb.php:7:
string(1) "0"
C:\Workspace\PDT\__CORE\php_cmb.php:7:
int(1252)
C:\Workspace\PDT\__CORE\php_cmb.php:7:
int(437)
 [2020-04-08 07:37 UTC] cmb@php.net
-Status: Assigned +Status: Feedback
 [2020-04-08 07:37 UTC] cmb@php.net
Thanks for the info!  So apparently your script is UTF-8 encoded,
and you're using the default INI settings, which is supposed to
produce the desired filenames, but it works as if there was a call
to `sapi_windows_cp_set(1252)` at the beginning of the script.  Is
there a respective auto_prepend_file?

Anyhow, I suppose that adding

    sapi_windows_cp_set(65001);

at the top of the script should enforce the desired behavior.

By the way, which SAPI do you use?
 [2020-04-09 14:47 UTC] pguest at meaa dot mea dot com
I use CLI SAPI within Eclipse PDT IDE.  I do not explicitly involve an auto_prepend_file within php.ini.  I am not sure whether the Eclipse environment invokes certain configuration files prior to a run configuration launch.
However I am thankful for your suggestion for use of sapi_windows_cp_set() within my scripts.  This dialog with you has helped educate me in regard to character encoding and code pages.  Thank you!
 [2020-04-16 10:25 UTC] cmb@php.net
-Status: Feedback +Status: Not a bug
 [2020-04-16 10:25 UTC] cmb@php.net
> I use CLI SAPI within Eclipse PDT IDE.

Ah, that custom console layer likely explains the issue.  Great
that you have been able to resolve the problem!
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 21 12:01:31 2024 UTC