php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #74927 UTF8 characters wrong processed in file system
Submitted: 2017-07-14 19:52 UTC Modified: 2017-07-18 21:22 UTC
From: furun at arcor dot de Assigned:
Status: Not a bug Package: *Directory/Filesystem functions
PHP Version: 7.1.7 OS: Win7 and Linux
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: furun at arcor dot de
New email:
PHP Version: OS:

 

 [2017-07-14 19:52 UTC] furun at arcor dot de
Description:
------------
UTF8 characters wrong processed in file system

PHP 7.1.7, Win7, Xampp
PHP 7.1.5, Linux

PHP has problems to process UTF8 characters in files correctly.
there are maybe multiple buggy file-functions, but i tested: file_put_contents, file_get_contents, glob.
file_put_contents, file_get_contents writes and reads files correctly, 
but glob don't list them correctly.

(is there a .htaccess or PHP or system option to fix this, or it is a PHP-Code problem? i think it is a PHP bug.)
(PHP7 has still a very weak support of UFT8 encoding? maybe all file functions should be tested?)


Results:
in both OS systems (Windows 7 and Linux), the files are created correctly.
but they are not read back correctly with glob(),


file_put_contents, file_get_contents (OK):
!äÄ
ä!Ä
äÄ!
ä!Ä.txt
äÄ.txt
 ÿ.txt
Āﻼ.txt
أ¤أ„.tx2


glob() (BUG):

PHP_OS: WINNT
 ÿ.txt	8
ä!Ä.txt	9
äÄ.txt	8
Āﻼ.txt	9
أ¤أ„.tx2	13

PHP_OS: Linux
.txt	4
!Ä.txt	7
.txt	4
.txt	4
.tx2	4


Test script:
---------------

define('DIR_BASE', realpath(dirname(__FILE__) . DIRECTORY_SEPARATOR) . DIRECTORY_SEPARATOR);
define('DIR_TEMP', DIR_BASE . 'temp' . DIRECTORY_SEPARATOR);


print(DIR_BASE . '<br>');
print(DIR_TEMP . '<br>');
print('<br>');
print('PHP_OS: ' . PHP_OS . '<br><br>');


try {
	if (! file_exists(DIR_TEMP) && ! is_dir(DIR_TEMP)) {
		$check = mkdir(DIR_TEMP, 0755); 
		if ($check) $check = chmod(DIR_TEMP, 0755);
	}
} catch (Exception $exception) {
}



print('file_put_contents, file_get_contents:<br>');

$fileName = "!äÄ";
        file_put_contents(DIR_TEMP . $fileName, $fileName);
$data = file_get_contents(DIR_TEMP . $fileName);
print($data . '<br>');

$fileName = "ä!Ä";
        file_put_contents(DIR_TEMP . $fileName, $fileName);
$data = file_get_contents(DIR_TEMP . $fileName);
print($data . '<br>');

$fileName = "äÄ!";
        file_put_contents(DIR_TEMP . $fileName, $fileName);
$data = file_get_contents(DIR_TEMP . $fileName);
print($data . '<br>');

$fileName = "ä!Ä.txt";
        file_put_contents(DIR_TEMP . $fileName, $fileName);
$data = file_get_contents(DIR_TEMP . $fileName);
print($data . '<br>');


$fileName = "äÄ.txt";
        file_put_contents(DIR_TEMP . $fileName, $fileName);
$data = file_get_contents(DIR_TEMP . $fileName);
print($data . '<br>');

$fileName = " ÿ.txt";
        file_put_contents(DIR_TEMP . $fileName, $fileName);
$data = file_get_contents(DIR_TEMP . $fileName);
print($data . '<br>');

$fileName = "Āﻼ.txt";
        file_put_contents(DIR_TEMP . $fileName, $fileName);
$data = file_get_contents(DIR_TEMP . $fileName);
print($data . '<br>');


$fileName = "äÄ.tx2";
$fileName = iconv('windows-1256', 'utf-8', $fileName);
        file_put_contents(DIR_TEMP . $fileName, $fileName);
$data = file_get_contents(DIR_TEMP . $fileName);
print($data . '<br>');


$data  = ''; 
$data .= '<table class="text documentlist sortable">' . "\n";
$data .= '<tbody>' . "\n";

print('<br><br>glob:<br>');
$fileList = glob(DIR_TEMP . '*.*', 0);
foreach ($fileList as $filePath) {
	$fileName = basename($filePath);
	
	$data .= 	'<tr">' .
				'<td>' . '<a href="temp/' . htmlentities(urlencode($fileName)) . '">' . $fileName . '</a></td>'.
				'<td>' . strlen($fileName) . '</td>'.
				'</tr>' . "\n"; //FIXME Einlesen
}

$data .= '</tbody>' . "\n";
$data .= '</table>' . "\n";
print($data);


Expected result:
----------------
all files listed in correct names


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2017-07-14 21:14 UTC] danack@php.net
-Status: Open +Status: Feedback
 [2017-07-14 21:14 UTC] danack@php.net
What "linux" system are you testing on?


If it's the alleged Linux sub-system on Windows 10, you are very likely to just be seeing Windows issues come through the alleged VM.


Testing on Centos with the simpler code below, doesn't show any problems, exception that glob() doesn't return any files starting with an exclamation mark.



<?php

define('DIR_TEMP', __DIR__ . '/test/');
@mkdir(DIR_TEMP, 0755, true);
print('PHP_OS: ' . PHP_OS . '<br><br>');

$filenames = [
    "!äÄ",
    "!Hello_I_start_with_an_exclamation",
    " ÿ.txt",
    "Āﻼ.txt",
    "ä!Ä.txt",
    "äÄ.txt",
    iconv('windows-1256', 'utf-8', "äÄ.tx2")
];

foreach ($filenames as $filename) {
    $written = file_put_contents(DIR_TEMP . $filename, 'foo');
    if ($written === false) {
        echo "Failed to write file $filename \n";
    }

    if (file_exists(DIR_TEMP . $filename) === false) {
        echo "written but doesn't exist?\n";
    }
}

var_dump($filenames);
$fileList = glob(DIR_TEMP . '*.*', 0);
var_dump($fileList);




PHP_OS: Linux<br><br>array(7) {
  [0]=>
  string(5) "!äÄ"
  [1]=>
  string(34) "!Hello_I_start_with_an_exclamation"
  [2]=>
  string(7) " ÿ.txt"
  [3]=>
  string(9) "Āﻼ.txt"
  [4]=>
  string(9) "ä!Ä.txt"
  [5]=>
  string(8) "äÄ.txt"
  [6]=>
  string(13) "أ¤أ„.tx2"
}
array(5) {
  [0]=>
  string(58) "/testing/test/ ÿ.txt"
  [1]=>
  string(60) "/testing/test/Āﻼ.txt"
  [2]=>
  string(61) "/testing/test/ä!Ä.txt"
  [3]=>
  string(60) "/testing/test/äÄ.txt"
  [4]=>
  string(67) "/testing/test/أ¤أ„.tx2"
}
 [2017-07-14 22:10 UTC] furun at arcor dot de
The linux is,
Host: x86_64-redhat-linux-gnu
(the sys is rented, i have only very limited control, no connection with Win??)

glob() and any other file function must work correctly on any system (>=Win7(?XP)... >=linux?...).
that glob() don't return a complete list, and returns incomplete filenames is the bug.
i speculate the UTF8 is managed wrongly.

Thanks to pretty up the code... :-)
 [2017-07-15 00:47 UTC] spam2 at rhsoft dot net
"x86_64-redhat-linux-gnu" can be anything

"cat /etc/redhat-release" and "uname -r" should you give more infos and even phpinfo() shows the kernel which has "el6", "el7" and so on as suffix which shows the RHEL/CentOS version
 [2017-07-15 14:04 UTC] furun at arcor dot de
-Status: Feedback +Status: Open
 [2017-07-15 14:04 UTC] furun at arcor dot de
the hoster says "CentOS of Red Hat Enterprise", more infos he don't give.
and phpinfo shows : "Host x86_64-redhat-linux-gnu"
it is a professional webhoster, lets assume they use the newest stable LTS-Version.


glob() shows:

File             Lunix           Win7
"Änderung"       -               -                Is not in the list
"Änderung.txt"   "nderung.txt"   "Änderung.txt"
 [2017-07-15 14:15 UTC] spam2 at rhsoft dot net
"Host x86_64-redhat-linux-gnu" - nonsense - that's the output of "Configure Command" - there is also "System" and if the hoster has patched PHP not to show you the environemnt nor tell you something sueful than ask the hoster to solve your PHP problem because nobody can help you in that case from outside

if you can't follow (don't matter for hat reasons) http://www.catb.org/esr/faqs/smart-questions.html#beprecise how do you expect anybody to help?

here copy&paste from a sane install and you have "System" which shows the kernel which is "4.11.10-200.fc25.x86_64" and the fc25 shows it's a Fedora 25 machine

PHP Version 7.1.8-dev
System 	Linux srv-rhsoft.rhsoft.net 4.11.10-200.fc25.x86_64 #1 SMP Wed Jul 12 19:04:52 UTC 2017 x86_64
Build Date 	Jul 13 2017 17:25:13
Configure Command 	' ./configure' ' --host=x86_64-redhat-linux' ' --build=x86_64-redhat-linux' ' --target=x86_64-redhat-linux' ' --prefix=/usr' ' --program-prefix=' ' --libdir=/usr/lib64/php' ' --disable-all' ' --disable-dependency-tracking' ' --enable-bcmath=shared' ' --enable-calendar=shared' ' --enable-cli' ' --enable-ctype=shared' ' --enable-dom=shared' ' --enable-exif=shared' ' --enable-fileinfo=shared' ' --enable-filter' ' --enable-hash=shared' ' --enable-huge-code-pages' ' --enable-inline-optimization' ' --enable-intl=shared' ' --enable-json=shared' ' --enable-libxml' ' --enable-mbregex' ' --enable-mbstring=shared' ' --enable-mysqlnd=shared' ' --enable-opcache=shared' ' --enable-opcache-jit' ' --enable-pcntl=shared' ' --enable-pdo=shared' ' --enable-phar=shared' ' --enable-posix=shared' ' --enable-re2c-cgoto' ' --enable-session=shared' ' --enable-shared' ' --enable-simplexml=shared' ' --enable-soap=shared' ' --enable-sockets=shared' ' --enable-tokenizer=shared' ' --enable-xml=shared' ' --enable-xmlreader=shared' ' --enable-xmlwriter=shared' ' --enable-zip=shared' ' --with-apxs2=/usr/bin/apxs' ' --with-bz2=shared, /usr' ' --with-config-file-path=/etc' ' --with-config-file-scan-dir=/etc/php.lounge.d' ' --with-curl=shared, /usr' ' --with-freetype-dir=/usr' ' --with-gd=shared, /usr' ' --with-gettext=shared, /usr' ' --with-iconv=shared' ' --with-imap-ssl=/usr' ' --with-imap=shared, /usr' ' --with-kerberos=/usr' ' --with-layout=GNU' ' --with-libdir=lib64' ' --with-libedit=shared, /usr' ' --with-libxml-dir=/usr' ' --with-libzip=/usr' ' --with-mysql-sock=/var/lib/mysql/mysql.sock' ' --with-mysqli=shared, mysqlnd' ' --with-openssl=shared, /usr' ' --with-pcre-jit' ' --with-pcre-regex=/usr' ' --with-pdo-mysql=shared, mysqlnd' ' --with-pic' ' --with-system-ciphers' ' --with-system-tzdata' ' --with-tidy=shared, /usr' ' --with-zlib=shared' ' --with-zlib-dir=/usr' ' --disable-cgi' ' --disable-dmalloc' ' --disable-dtrace' ' --disable-gcov' ' --disable-gd-jis-conv' ' --disable-ipv6' ' --disable-mysqlnd-compression-support' ' --disable-opcache-file' ' --disable-phpdbg' ' --disable-rpath' ' --disable-short-tags' ' --disable-static' ' --enable-gcc-global-regs' ' --disable-debug' ' build_alias=x86_64-redhat-linux' ' host_alias=x86_64-redhat-linux' ' target_alias=x86_64-redhat-linux'
 [2017-07-18 21:22 UTC] ab@php.net
-Status: Open +Status: Not a bug
 [2017-07-18 21:22 UTC] ab@php.net
This looks same as in bug #74934, especially on Linux the test system has most likely some environment issue. On Windows i got same results as @danack. Besides environment and process locale, the encoding of the script itself matters, too.

The glob() behavior seems correct cross platform. The glob() rules are same as in a shell. For example, !3 has special meaning in bash, so it is not interpreted as a filename. Various tools also interpret files with meta chars a different way. If you're required to use such filenames, it makes likely more sense to use other iteration posibilities.

Thanks.
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Thu Jan 02 12:01:29 2025 UTC