php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #43148 filesize and unicode filenames
Submitted: 2007-10-30 18:25 UTC Modified: 2010-11-10 18:11 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: banu_daniel1 at yahoo dot com Assigned:
Status: Not a bug Package: Filesystem function related
PHP Version: 5.2.4 OS: windows xp 32 bits
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: banu_daniel1 at yahoo dot com
New email:
PHP Version: OS:

 

 [2007-10-30 18:25 UTC] banu_daniel1 at yahoo dot com
Description:
------------
it seems that this bug was reported and it says there "Marking as closed... also please try a newer version. " but the problem is still there even on windows xp so this is the problem
filesize function dose not work with filenames with unicode characters.
on linux version i don't have this problem.

Reproduce code:
---------------
$size=sprintf("%u", filesize($eps["FNAME"][$f])

Actual result:
--------------
Warning: filesize() [function.filesize]: stat failed for E:\Emilya・Chad.mpg in C:\WWW\files.php on line 45

Just an example.

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2007-10-30 21:32 UTC] carsten_sttgt at gmx dot de
The filesystem functions have no problem with unicode (utf-8). But you must use a unicode char, and not a HTML entity which are represent a unicode char.

| $size= sprintf("%u",
|     filesize(html_entity_decode($eps["FNAME"][$f],
|         ENT_COMPAT, 'UTF-8')
|     )
| );

Regards,
Carsten
 [2007-10-31 18:37 UTC] banu_daniel1 at yahoo dot com
it's not a HTML entity what i had wrote in this example. I don't know whitch one convert it in html entity. The problem is that i have unicode characters in filename and filesize and file_exists (i discover just now and maybe other fuctions for files/directories access) is not working
 [2007-10-31 20:57 UTC] carsten_sttgt at gmx dot de
Well, the "・" is a Katakana middle dot. You have not used this char in your sample? Because I have no problems to create and access files with this char in the filename with PHP.

You have a better (complete) testscript to reproduce the problem? Or what's the value $eps["FNAME"][$f] in hex?

Regards,
Carsten
 [2007-10-31 22:09 UTC] banu_daniel1 at yahoo dot com
well, i copy/paste the name of the file and did this
echo filesize("E:\Emilya?Chad.mpg"); and i have the same result
and also this
if (file_exists("E:\Emilya?Chad.mpg")) echo "ok" and is not working.

this is the hex value  453A456D696C7961FA436861642E6D7067
 [2007-10-31 23:45 UTC] carsten_sttgt at gmx dot de
> this is the hex value  453A456D696C7961FA436861642E6D7067

Well, this HEX looks like ISO-8859-1 and not UTF-8:
453A456D696C7961FA436861642E6D7067
E : E m i l y a ? C h a d . m p g
(The char between Emilya and Chad is a latin small letter u with acute)


> if (file_exists("E:\Emilya?Chad.mpg")) echo "ok" and is not working.
In this string we have (in UTF-8):
453A5C456D696C7961C2B7436861642E6D7067
E : \ E m i l y a ?   C h a d . m p g
(or in ISO-8859-1):
453A5C456D696C7961B7436861642E6D7067
E : \ E m i l y a ? C h a d . m p g
(The char between Emilya and Chad is a middle dot)


This works here:
<?php
$name = 'Emilya?Chad.mpg';
touch($name);
echo filesize($name);
?>


What's the codepage/language of your system?
What's the result (filename) if you use glob() or the directory class dir()?

Regards,
Carsten
 [2007-11-01 19:51 UTC] banu_daniel1 at yahoo dot com
i've don this
<?
if (file_exists("D:\Downloads\&#21205;&#20043;&#23478;&#30332;&#20296;\&#12495;&#12516;&#12486;&#12398;&#12372;&#12392;&#12367;01\DVD_VIDEO.MDS")) echo "ok";
else echo "files dose not exist<br>\n";
print_r(glob("D:\Downloads\&#21205;&#20043;&#23478;&#30332;&#20296;\&#12495;&#12516;&#12486;&#12398;&#12372;&#12392;&#12367;01\DVD_VIDEO.MDS"));

//exemple from help
$d = dir("D:\Downloads\&#21205;&#20043;&#23478;&#30332;&#20296;");
echo "Handle: " . $d->handle . "\n";
echo "Path: " . $d->path . "\n";
while (false !== ($entry = $d->read())) {
  echo $entry."<br>\n";
}
$d->close();

?>
 [2007-11-01 19:53 UTC] banu_daniel1 at yahoo dot com
results:

files dose not exist
Array ( )
Warning: dir(D:\Downloads\&#21205;&#20043;&#23478;&#30332;&#20296;) [function.dir]: failed to open dir: No such file or directory in C:\WWW\test2.php on line 7
Handle: Path:
Fatal error: Call to a member function read() on a non-object in C:\WWW\test2.php on line 10
 [2007-11-01 19:54 UTC] banu_daniel1 at yahoo dot com
and again all characters are converted in html code
 [2007-11-01 20:59 UTC] carsten_sttgt at gmx dot de
> print_r(glob("D:\Downloads\&#21205;&#20043;&#23478;&#30332;&#20296;\
> &#12495;&#12516;&#12486;&#12398;&#12372;&#12392;&#12367;01\
> DVD_VIDEO.MDS"));

Sorry, I was not clear enought... Don't provide a parameter to glob(). Just look what glob() returns:
| $dirs = glob('"D:/Downloads/*', GLOB_ONLYDIR);

And now analyse the array $dirs. The value of one of the keys must be the directory "D:/Downloads/&#21205;&#20043;&#23478;&#30332;&#20296;", and look how this directory is stored in the array.

Regards,
Carsten
 [2007-11-01 21:42 UTC] banu_daniel1 at yahoo dot com
$dirs = glob('"D:/Downloads/*', GLOB_ONLYDIR);
print_r($dirs);

result is
Array ( )
 [2007-11-01 21:57 UTC] carsten_sttgt at gmx dot de
> dirs = glob('"D:/Downloads/*', GLOB_ONLYDIR);
             --^
Please remove my typo... (you have not seen that?):
| dirs = glob('D:/Downloads/*', GLOB_ONLYDIR);

Regards,
Carsten
 [2007-11-01 22:11 UTC] banu_daniel1 at yahoo dot com
no i didn't see that. i remove that " and the result is exactly the same( Array ( ) ).
I've try with other folders (non utf) and it works.
 [2007-11-02 17:48 UTC] carsten_sttgt at gmx dot de
> but the problem is still there even on windows xp
> so this is the problem filesize function dose not
> work with filenames with unicode characters.

Ok, after some more tests, I can reproduce this problem. Just look at this shell log:
| D:\>cd D:\Apache2.2\htdocs\test\&#945;&#946;&#947;&#948;&#949;&#950;&#951;&#952;
|
| D:\Apache2.2\htdocs\test\&#945;&#946;&#947;&#948;&#949;&#950;&#951;&#952;>dir /b
| index.html
| phpinfo.php
|
| D:\Apache2.2\htdocs\test\&#945;&#946;&#947;&#948;&#949;&#950;&#951;&#952;>type index.html
| <html><body><h1>It works!</h1></body></html>
| D:\Apache2.2\htdocs\test\&#945;&#946;&#947;&#948;&#949;&#950;&#951;&#952;>type phpinfo.php
| <?php phpinfo(); ?>
|
| D:\Apache2.2\htdocs\test\&#945;&#946;&#947;&#948;&#949;&#950;&#951;&#952;>pear-request http://localhost/
| test/%ce%b1%ce%b2%ce%b3%ce%b4%ce%b5%ce%b6%ce%b7%ce%b8/index.html
| <html><body><h1>It works!</h1></body></html>
| D:\Apache2.2\htdocs\test\&#945;&#946;&#947;&#948;&#949;&#950;&#951;&#952;>php -r "echo getcwd();"
| D:\Apache2.2\htdocs\test\a??de???
| D:\Apache2.2\htdocs\test\&#945;&#946;&#947;&#948;&#949;&#950;&#951;&#952;>cd..
|
| D:\Apache2.2\htdocs\test>php -r "var_dump(stat('&#945;&#946;&#947;&#948;&#949;&#950;&#951;&#952;'));"
|
| Warning: stat(): stat failed for a??de??? in Command line code on
|  line 1
| bool(false)
|
| D:\Apache2.2\htdocs\test>

As you can see, I can't execute a PHP script in this folder ("&#945;&#946;&#947;&#948;&#949;&#950;&#951;&#952;") or use the PHP filesystem functions with this path. But I can access this folder correctly with Apache via HTTP.


> on linux version i don't have this problem.

That's the difference. On Linux (or PHP) you have only UTF-8. But Windows is using UTF-16 (or the current codepage for the installed locale).


Just look at this script "test.php" (encoded in UTF-8):
| <?php
| mkdir('&#945;&#946;&#947;&#948;&#949;&#950;&#951;&#952;');
| var_dump(is_dir('&#945;&#946;&#947;&#948;&#949;&#950;&#951;&#952;'));
| ?>

and the shell log:
| D:\Apache2.2\htdocs\test>php test.php
| bool(true)
|
| D:\Apache2.2\htdocs\test>dir /b
| test.php
| αβγδεζηθ
| 
| D:\Apache2.2\htdocs\test>

As you can see, you can create and access such paths with such a name with PHP, but only inside PHP. In Windows or Apache you must use an other (wrong) name. In this case PHP is just using the byte sequence of UTF-8 chars as Latin1 chars.

This can be a quick fix for you, but is indeed not correct.

The problem is, PHP is only using simple string and filesystem functions in the c sources, which are only working with the current locale codepage. But it is not using the wide char and filesystem functions from the Windows SDK, like Apache did.

BTW:
With a current PHP6 snap (full unicode support?), this also don't work.

Regards,
Carsten

BTW:
There is another bug in this bugtracker. You can't use UTF-8 chars in bug reports, after submitting a comment, UTF-8 chars will be replaced with entities, but all comments are placed between <pre> tags. Thus the browser shows entities and not the correct chars.

Please open this html page with a browser:
| <html>
| <head>
| <meta http-equiv=content-type content="text/html; charset=UTF-8">
| </head>
| <body>
| &#945;&#946;&#947;&#948;&#949;&#950;&#951;&#952;
| </body>
| </html>
and replace all entities in by comment with the chars you can see in the browser.
 [2007-11-12 10:03 UTC] tony2001@php.net
PHP doesn't care if it's Unicode or not, it just passes the filename to the OSes filesystem func and if it fails - we can do nothing about it.
 [2010-11-10 17:33 UTC] anton85s at mail dot ru
"it just passes the filename to the OSes filesystem func and if it fails - we can do nothing about it."
but it doesn't pass the filename to the unicode version of the filesystem function, right ? It means that php could be modifed to use the correct filesystem function at least, not non-unicode ones for all calls.
 [2010-11-10 18:11 UTC] pajoye@php.net
Yes, and I'm working on this change, it will accept UTF-8 as input just like what we do on Unices/POSIX systems.
 [2012-02-04 21:33 UTC] sebastian dot mayer at maysoft dot de
Hallo,

running php 5.3.5 on windows I have a Problem with the degree char "°".
Scandir or opendir, readdir retrieves an entry called "Up-wards at 45°.mp3". If I want to get filesize or filetime message "filesize(): stat failed" comes up. All other characters (for example german Umlaute ä, ü ...) don't have this Problem.
 
PHP Copyright © 2001-2020 The PHP Group
All rights reserved.
Last updated: Thu Nov 26 13:01:23 2020 UTC