php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #74927 UTF8 characters wrong processed in file system
Submitted: 2017-07-14 19:52 UTC Modified: 2017-07-18 21:22 UTC
From: furun at arcor dot de Assigned:
Status: Not a bug Package: *Directory/Filesystem functions
PHP Version: 7.1.7 OS: Win7 and Linux
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: furun at arcor dot de
New email:
PHP Version: OS:

 

 [2017-07-14 19:52 UTC] furun at arcor dot de
Description:
------------
UTF8 characters wrong processed in file system

PHP 7.1.7, Win7, Xampp
PHP 7.1.5, Linux

PHP has problems to process UTF8 characters in files correctly.
there are maybe multiple buggy file-functions, but i tested: file_put_contents, file_get_contents, glob.
file_put_contents, file_get_contents writes and reads files correctly, 
but glob don't list them correctly.

(is there a .htaccess or PHP or system option to fix this, or it is a PHP-Code problem? i think it is a PHP bug.)
(PHP7 has still a very weak support of UFT8 encoding? maybe all file functions should be tested?)


Results:
in both OS systems (Windows 7 and Linux), the files are created correctly.
but they are not read back correctly with glob(),


file_put_contents, file_get_contents (OK):
!äÄ
ä!Ä
äÄ!
ä!Ä.txt
äÄ.txt
 ÿ.txt
Āﻼ.txt
أ¤أ„.tx2


glob() (BUG):

PHP_OS: WINNT
 ÿ.txt	8
ä!Ä.txt	9
äÄ.txt	8
Āﻼ.txt	9
أ¤أ„.tx2	13

PHP_OS: Linux
.txt	4
!Ä.txt	7
.txt	4
.txt	4
.tx2	4


Test script:
---------------

define('DIR_BASE', realpath(dirname(__FILE__) . DIRECTORY_SEPARATOR) . DIRECTORY_SEPARATOR);
define('DIR_TEMP', DIR_BASE . 'temp' . DIRECTORY_SEPARATOR);


print(DIR_BASE . '<br>');
print(DIR_TEMP . '<br>');
print('<br>');
print('PHP_OS: ' . PHP_OS . '<br><br>');


try {
	if (! file_exists(DIR_TEMP) && ! is_dir(DIR_TEMP)) {
		$check = mkdir(DIR_TEMP, 0755); 
		if ($check) $check = chmod(DIR_TEMP, 0755);
	}
} catch (Exception $exception) {
}



print('file_put_contents, file_get_contents:<br>');

$fileName = "!äÄ";
        file_put_contents(DIR_TEMP . $fileName, $fileName);
$data = file_get_contents(DIR_TEMP . $fileName);
print($data . '<br>');

$fileName = "ä!Ä";
        file_put_contents(DIR_TEMP . $fileName, $fileName);
$data = file_get_contents(DIR_TEMP . $fileName);
print($data . '<br>');

$fileName = "äÄ!";
        file_put_contents(DIR_TEMP . $fileName, $fileName);
$data = file_get_contents(DIR_TEMP . $fileName);
print($data . '<br>');

$fileName = "ä!Ä.txt";
        file_put_contents(DIR_TEMP . $fileName, $fileName);
$data = file_get_contents(DIR_TEMP . $fileName);
print($data . '<br>');


$fileName = "äÄ.txt";
        file_put_contents(DIR_TEMP . $fileName, $fileName);
$data = file_get_contents(DIR_TEMP . $fileName);
print($data . '<br>');

$fileName = " ÿ.txt";
        file_put_contents(DIR_TEMP . $fileName, $fileName);
$data = file_get_contents(DIR_TEMP . $fileName);
print($data . '<br>');

$fileName = "Āﻼ.txt";
        file_put_contents(DIR_TEMP . $fileName, $fileName);
$data = file_get_contents(DIR_TEMP . $fileName);
print($data . '<br>');


$fileName = "äÄ.tx2";
$fileName = iconv('windows-1256', 'utf-8', $fileName);
        file_put_contents(DIR_TEMP . $fileName, $fileName);
$data = file_get_contents(DIR_TEMP . $fileName);
print($data . '<br>');


$data  = ''; 
$data .= '<table class="text documentlist sortable">' . "\n";
$data .= '<tbody>' . "\n";

print('<br><br>glob:<br>');
$fileList = glob(DIR_TEMP . '*.*', 0);
foreach ($fileList as $filePath) {
	$fileName = basename($filePath);
	
	$data .= 	'<tr">' .
				'<td>' . '<a href="temp/' . htmlentities(urlencode($fileName)) . '">' . $fileName . '</a></td>'.
				'<td>' . strlen($fileName) . '</td>'.
				'</tr>' . "\n"; //FIXME Einlesen
}

$data .= '</tbody>' . "\n";
$data .= '</table>' . "\n";
print($data);


Expected result:
----------------
all files listed in correct names


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2017-07-14 21:14 UTC] danack@php.net
-Status: Open +Status: Feedback
 [2017-07-14 21:14 UTC] danack@php.net
What "linux" system are you testing on?


If it's the alleged Linux sub-system on Windows 10, you are very likely to just be seeing Windows issues come through the alleged VM.


Testing on Centos with the simpler code below, doesn't show any problems, exception that glob() doesn't return any files starting with an exclamation mark.



<?php

define('DIR_TEMP', __DIR__ . '/test/');
@mkdir(DIR_TEMP, 0755, true);
print('PHP_OS: ' . PHP_OS . '<br><br>');

$filenames = [
    "!äÄ",
    "!Hello_I_start_with_an_exclamation",
    " ÿ.txt",
    "Āﻼ.txt",
    "ä!Ä.txt",
    "äÄ.txt",
    iconv('windows-1256', 'utf-8', "äÄ.tx2")
];

foreach ($filenames as $filename) {
    $written = file_put_contents(DIR_TEMP . $filename, 'foo');
    if ($written === false) {
        echo "Failed to write file $filename \n";
    }

    if (file_exists(DIR_TEMP . $filename) === false) {
        echo "written but doesn't exist?\n";
    }
}

var_dump($filenames);
$fileList = glob(DIR_TEMP . '*.*', 0);
var_dump($fileList);




PHP_OS: Linux<br><br>array(7) {
  [0]=>
  string(5) "!äÄ"
  [1]=>
  string(34) "!Hello_I_start_with_an_exclamation"
  [2]=>
  string(7) " ÿ.txt"
  [3]=>
  string(9) "Āﻼ.txt"
  [4]=>
  string(9) "ä!Ä.txt"
  [5]=>
  string(8) "äÄ.txt"
  [6]=>
  string(13) "أ¤أ„.tx2"
}
array(5) {
  [0]=>
  string(58) "/testing/test/ ÿ.txt"
  [1]=>
  string(60) "/testing/test/Āﻼ.txt"
  [2]=>
  string(61) "/testing/test/ä!Ä.txt"
  [3]=>
  string(60) "/testing/test/äÄ.txt"
  [4]=>
  string(67) "/testing/test/أ¤أ„.tx2"
}
 [2017-07-14 22:10 UTC] furun at arcor dot de
The linux is,
Host: x86_64-redhat-linux-gnu
(the sys is rented, i have only very limited control, no connection with Win??)

glob() and any other file function must work correctly on any system (>=Win7(?XP)... >=linux?...).
that glob() don't return a complete list, and returns incomplete filenames is the bug.
i speculate the UTF8 is managed wrongly.

Thanks to pretty up the code... :-)
 [2017-07-15 00:47 UTC] spam2 at rhsoft dot net
"x86_64-redhat-linux-gnu" can be anything

"cat /etc/redhat-release" and "uname -r" should you give more infos and even phpinfo() shows the kernel which has "el6", "el7" and so on as suffix which shows the RHEL/CentOS version
 [2017-07-15 14:04 UTC] furun at arcor dot de
-Status: Feedback +Status: Open
 [2017-07-15 14:04 UTC] furun at arcor dot de
the hoster says "CentOS of Red Hat Enterprise", more infos he don't give.
and phpinfo shows : "Host x86_64-redhat-linux-gnu"
it is a professional webhoster, lets assume they use the newest stable LTS-Version.


glob() shows:

File             Lunix           Win7
"Änderung"       -               -                Is not in the list
"Änderung.txt"   "nderung.txt"   "Änderung.txt"
 [2017-07-15 14:15 UTC] spam2 at rhsoft dot net
"Host x86_64-redhat-linux-gnu" - nonsense - that's the output of "Configure Command" - there is also "System" and if the hoster has patched PHP not to show you the environemnt nor tell you something sueful than ask the hoster to solve your PHP problem because nobody can help you in that case from outside

if you can't follow (don't matter for hat reasons) http://www.catb.org/esr/faqs/smart-questions.html#beprecise how do you expect anybody to help?

here copy&paste from a sane install and you have "System" which shows the kernel which is "4.11.10-200.fc25.x86_64" and the fc25 shows it's a Fedora 25 machine

PHP Version 7.1.8-dev
System 	Linux srv-rhsoft.rhsoft.net 4.11.10-200.fc25.x86_64 #1 SMP Wed Jul 12 19:04:52 UTC 2017 x86_64
Build Date 	Jul 13 2017 17:25:13
Configure Command 	' ./configure' ' --host=x86_64-redhat-linux' ' --build=x86_64-redhat-linux' ' --target=x86_64-redhat-linux' ' --prefix=/usr' ' --program-prefix=' ' --libdir=/usr/lib64/php' ' --disable-all' ' --disable-dependency-tracking' ' --enable-bcmath=shared' ' --enable-calendar=shared' ' --enable-cli' ' --enable-ctype=shared' ' --enable-dom=shared' ' --enable-exif=shared' ' --enable-fileinfo=shared' ' --enable-filter' ' --enable-hash=shared' ' --enable-huge-code-pages' ' --enable-inline-optimization' ' --enable-intl=shared' ' --enable-json=shared' ' --enable-libxml' ' --enable-mbregex' ' --enable-mbstring=shared' ' --enable-mysqlnd=shared' ' --enable-opcache=shared' ' --enable-opcache-jit' ' --enable-pcntl=shared' ' --enable-pdo=shared' ' --enable-phar=shared' ' --enable-posix=shared' ' --enable-re2c-cgoto' ' --enable-session=shared' ' --enable-shared' ' --enable-simplexml=shared' ' --enable-soap=shared' ' --enable-sockets=shared' ' --enable-tokenizer=shared' ' --enable-xml=shared' ' --enable-xmlreader=shared' ' --enable-xmlwriter=shared' ' --enable-zip=shared' ' --with-apxs2=/usr/bin/apxs' ' --with-bz2=shared, /usr' ' --with-config-file-path=/etc' ' --with-config-file-scan-dir=/etc/php.lounge.d' ' --with-curl=shared, /usr' ' --with-freetype-dir=/usr' ' --with-gd=shared, /usr' ' --with-gettext=shared, /usr' ' --with-iconv=shared' ' --with-imap-ssl=/usr' ' --with-imap=shared, /usr' ' --with-kerberos=/usr' ' --with-layout=GNU' ' --with-libdir=lib64' ' --with-libedit=shared, /usr' ' --with-libxml-dir=/usr' ' --with-libzip=/usr' ' --with-mysql-sock=/var/lib/mysql/mysql.sock' ' --with-mysqli=shared, mysqlnd' ' --with-openssl=shared, /usr' ' --with-pcre-jit' ' --with-pcre-regex=/usr' ' --with-pdo-mysql=shared, mysqlnd' ' --with-pic' ' --with-system-ciphers' ' --with-system-tzdata' ' --with-tidy=shared, /usr' ' --with-zlib=shared' ' --with-zlib-dir=/usr' ' --disable-cgi' ' --disable-dmalloc' ' --disable-dtrace' ' --disable-gcov' ' --disable-gd-jis-conv' ' --disable-ipv6' ' --disable-mysqlnd-compression-support' ' --disable-opcache-file' ' --disable-phpdbg' ' --disable-rpath' ' --disable-short-tags' ' --disable-static' ' --enable-gcc-global-regs' ' --disable-debug' ' build_alias=x86_64-redhat-linux' ' host_alias=x86_64-redhat-linux' ' target_alias=x86_64-redhat-linux'
 [2017-07-18 21:22 UTC] ab@php.net
-Status: Open +Status: Not a bug
 [2017-07-18 21:22 UTC] ab@php.net
This looks same as in bug #74934, especially on Linux the test system has most likely some environment issue. On Windows i got same results as @danack. Besides environment and process locale, the encoding of the script itself matters, too.

The glob() behavior seems correct cross platform. The glob() rules are same as in a shell. For example, !3 has special meaning in bash, so it is not interpreted as a filename. Various tools also interpret files with meta chars a different way. If you're required to use such filenames, it makes likely more sense to use other iteration posibilities.

Thanks.
 
PHP Copyright © 2001-2022 The PHP Group
All rights reserved.
Last updated: Thu Aug 11 08:05:57 2022 UTC