php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #43148 filesize and unicode filenames
Submitted: 2007-10-30 18:25 UTC Modified: 2010-11-10 18:11 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: banu_daniel1 at yahoo dot com Assigned:
Status: Not a bug Package: Filesystem function related
PHP Version: 5.2.4 OS: windows xp 32 bits
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: banu_daniel1 at yahoo dot com
New email:
PHP Version: OS:

 

 [2007-10-30 18:25 UTC] banu_daniel1 at yahoo dot com
Description:
------------
it seems that this bug was reported and it says there "Marking as closed... also please try a newer version. " but the problem is still there even on windows xp so this is the problem
filesize function dose not work with filenames with unicode characters.
on linux version i don't have this problem.

Reproduce code:
---------------
$size=sprintf("%u", filesize($eps["FNAME"][$f])

Actual result:
--------------
Warning: filesize() [function.filesize]: stat failed for E:\Emilya・Chad.mpg in C:\WWW\files.php on line 45

Just an example.

Patches

sa (last revision 2023-01-09 12:42 UTC by hayatsizyazilim at gmail dot com)

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2007-10-30 21:32 UTC] carsten_sttgt at gmx dot de
The filesystem functions have no problem with unicode (utf-8). But you must use a unicode char, and not a HTML entity which are represent a unicode char.

| $size= sprintf("%u",
|     filesize(html_entity_decode($eps["FNAME"][$f],
|         ENT_COMPAT, 'UTF-8')
|     )
| );

Regards,
Carsten
 [2007-10-31 18:37 UTC] banu_daniel1 at yahoo dot com
it's not a HTML entity what i had wrote in this example. I don't know whitch one convert it in html entity. The problem is that i have unicode characters in filename and filesize and file_exists (i discover just now and maybe other fuctions for files/directories access) is not working
 [2007-10-31 20:57 UTC] carsten_sttgt at gmx dot de
Well, the "・" is a Katakana middle dot. You have not used this char in your sample? Because I have no problems to create and access files with this char in the filename with PHP.

You have a better (complete) testscript to reproduce the problem? Or what's the value $eps["FNAME"][$f] in hex?

Regards,
Carsten
 [2007-10-31 22:09 UTC] banu_daniel1 at yahoo dot com
well, i copy/paste the name of the file and did this
echo filesize("E:\Emilya?Chad.mpg"); and i have the same result
and also this
if (file_exists("E:\Emilya?Chad.mpg")) echo "ok" and is not working.

this is the hex value  453A456D696C7961FA436861642E6D7067
 [2007-10-31 23:45 UTC] carsten_sttgt at gmx dot de
> this is the hex value  453A456D696C7961FA436861642E6D7067

Well, this HEX looks like ISO-8859-1 and not UTF-8:
453A456D696C7961FA436861642E6D7067
E : E m i l y a ? C h a d . m p g
(The char between Emilya and Chad is a latin small letter u with acute)


> if (file_exists("E:\Emilya?Chad.mpg")) echo "ok" and is not working.
In this string we have (in UTF-8):
453A5C456D696C7961C2B7436861642E6D7067
E : \ E m i l y a ?   C h a d . m p g
(or in ISO-8859-1):
453A5C456D696C7961B7436861642E6D7067
E : \ E m i l y a ? C h a d . m p g
(The char between Emilya and Chad is a middle dot)


This works here:
<?php
$name = 'Emilya?Chad.mpg';
touch($name);
echo filesize($name);
?>


What's the codepage/language of your system?
What's the result (filename) if you use glob() or the directory class dir()?

Regards,
Carsten
 [2007-11-01 19:51 UTC] banu_daniel1 at yahoo dot com
i've don this
<?
if (file_exists("D:\Downloads\&#21205;&#20043;&#23478;&#30332;&#20296;\&#12495;&#12516;&#12486;&#12398;&#12372;&#12392;&#12367;01\DVD_VIDEO.MDS")) echo "ok";
else echo "files dose not exist<br>\n";
print_r(glob("D:\Downloads\&#21205;&#20043;&#23478;&#30332;&#20296;\&#12495;&#12516;&#12486;&#12398;&#12372;&#12392;&#12367;01\DVD_VIDEO.MDS"));

//exemple from help
$d = dir("D:\Downloads\&#21205;&#20043;&#23478;&#30332;&#20296;");
echo "Handle: " . $d->handle . "\n";
echo "Path: " . $d->path . "\n";
while (false !== ($entry = $d->read())) {
  echo $entry."<br>\n";
}
$d->close();

?>
 [2007-11-01 19:53 UTC] banu_daniel1 at yahoo dot com
results:

files dose not exist
Array ( )
Warning: dir(D:\Downloads\&#21205;&#20043;&#23478;&#30332;&#20296;) [function.dir]: failed to open dir: No such file or directory in C:\WWW\test2.php on line 7
Handle: Path:
Fatal error: Call to a member function read() on a non-object in C:\WWW\test2.php on line 10
 [2007-11-01 19:54 UTC] banu_daniel1 at yahoo dot com
and again all characters are converted in html code
 [2007-11-01 20:59 UTC] carsten_sttgt at gmx dot de
> print_r(glob("D:\Downloads\&#21205;&#20043;&#23478;&#30332;&#20296;\
> &#12495;&#12516;&#12486;&#12398;&#12372;&#12392;&#12367;01\
> DVD_VIDEO.MDS"));

Sorry, I was not clear enought... Don't provide a parameter to glob(). Just look what glob() returns:
| $dirs = glob('"D:/Downloads/*', GLOB_ONLYDIR);

And now analyse the array $dirs. The value of one of the keys must be the directory "D:/Downloads/&#21205;&#20043;&#23478;&#30332;&#20296;", and look how this directory is stored in the array.

Regards,
Carsten
 [2007-11-01 21:42 UTC] banu_daniel1 at yahoo dot com
$dirs = glob('"D:/Downloads/*', GLOB_ONLYDIR);
print_r($dirs);

result is
Array ( )
 [2007-11-01 21:57 UTC] carsten_sttgt at gmx dot de
> dirs = glob('"D:/Downloads/*', GLOB_ONLYDIR);
             --^
Please remove my typo... (you have not seen that?):
| dirs = glob('D:/Downloads/*', GLOB_ONLYDIR);

Regards,
Carsten
 [2007-11-01 22:11 UTC] banu_daniel1 at yahoo dot com
no i didn't see that. i remove that " and the result is exactly the same( Array ( ) ).
I've try with other folders (non utf) and it works.
 [2007-11-02 17:48 UTC] carsten_sttgt at gmx dot de
> but the problem is still there even on windows xp
> so this is the problem filesize function dose not
> work with filenames with unicode characters.

Ok, after some more tests, I can reproduce this problem. Just look at this shell log:
| D:\>cd D:\Apache2.2\htdocs\test\&#945;&#946;&#947;&#948;&#949;&#950;&#951;&#952;
|
| D:\Apache2.2\htdocs\test\&#945;&#946;&#947;&#948;&#949;&#950;&#951;&#952;>dir /b
| index.html
| phpinfo.php
|
| D:\Apache2.2\htdocs\test\&#945;&#946;&#947;&#948;&#949;&#950;&#951;&#952;>type index.html
| <html><body><h1>It works!</h1></body></html>
| D:\Apache2.2\htdocs\test\&#945;&#946;&#947;&#948;&#949;&#950;&#951;&#952;>type phpinfo.php
| <?php phpinfo(); ?>
|
| D:\Apache2.2\htdocs\test\&#945;&#946;&#947;&#948;&#949;&#950;&#951;&#952;>pear-request http://localhost/
| test/%ce%b1%ce%b2%ce%b3%ce%b4%ce%b5%ce%b6%ce%b7%ce%b8/index.html
| <html><body><h1>It works!</h1></body></html>
| D:\Apache2.2\htdocs\test\&#945;&#946;&#947;&#948;&#949;&#950;&#951;&#952;>php -r "echo getcwd();"
| D:\Apache2.2\htdocs\test\a??de???
| D:\Apache2.2\htdocs\test\&#945;&#946;&#947;&#948;&#949;&#950;&#951;&#952;>cd..
|
| D:\Apache2.2\htdocs\test>php -r "var_dump(stat('&#945;&#946;&#947;&#948;&#949;&#950;&#951;&#952;'));"
|
| Warning: stat(): stat failed for a??de??? in Command line code on
|  line 1
| bool(false)
|
| D:\Apache2.2\htdocs\test>

As you can see, I can't execute a PHP script in this folder ("&#945;&#946;&#947;&#948;&#949;&#950;&#951;&#952;") or use the PHP filesystem functions with this path. But I can access this folder correctly with Apache via HTTP.


> on linux version i don't have this problem.

That's the difference. On Linux (or PHP) you have only UTF-8. But Windows is using UTF-16 (or the current codepage for the installed locale).


Just look at this script "test.php" (encoded in UTF-8):
| <?php
| mkdir('&#945;&#946;&#947;&#948;&#949;&#950;&#951;&#952;');
| var_dump(is_dir('&#945;&#946;&#947;&#948;&#949;&#950;&#951;&#952;'));
| ?>

and the shell log:
| D:\Apache2.2\htdocs\test>php test.php
| bool(true)
|
| D:\Apache2.2\htdocs\test>dir /b
| test.php
| αβγδεζηθ
| 
| D:\Apache2.2\htdocs\test>

As you can see, you can create and access such paths with such a name with PHP, but only inside PHP. In Windows or Apache you must use an other (wrong) name. In this case PHP is just using the byte sequence of UTF-8 chars as Latin1 chars.

This can be a quick fix for you, but is indeed not correct.

The problem is, PHP is only using simple string and filesystem functions in the c sources, which are only working with the current locale codepage. But it is not using the wide char and filesystem functions from the Windows SDK, like Apache did.

BTW:
With a current PHP6 snap (full unicode support?), this also don't work.

Regards,
Carsten

BTW:
There is another bug in this bugtracker. You can't use UTF-8 chars in bug reports, after submitting a comment, UTF-8 chars will be replaced with entities, but all comments are placed between <pre> tags. Thus the browser shows entities and not the correct chars.

Please open this html page with a browser:
| <html>
| <head>
| <meta http-equiv=content-type content="text/html; charset=UTF-8">
| </head>
| <body>
| &#945;&#946;&#947;&#948;&#949;&#950;&#951;&#952;
| </body>
| </html>
and replace all entities in by comment with the chars you can see in the browser.
 [2007-11-12 10:03 UTC] tony2001@php.net
PHP doesn't care if it's Unicode or not, it just passes the filename to the OSes filesystem func and if it fails - we can do nothing about it.
 [2010-11-10 17:33 UTC] anton85s at mail dot ru
"it just passes the filename to the OSes filesystem func and if it fails - we can do nothing about it."
but it doesn't pass the filename to the unicode version of the filesystem function, right ? It means that php could be modifed to use the correct filesystem function at least, not non-unicode ones for all calls.
 [2010-11-10 18:11 UTC] pajoye@php.net
Yes, and I'm working on this change, it will accept UTF-8 as input just like what we do on Unices/POSIX systems.
 [2012-02-04 21:33 UTC] sebastian dot mayer at maysoft dot de
Hallo,

running php 5.3.5 on windows I have a Problem with the degree char "°".
Scandir or opendir, readdir retrieves an entry called "Up-wards at 45°.mp3". If I want to get filesize or filetime message "filesize(): stat failed" comes up. All other characters (for example german Umlaute ä, ü ...) don't have this Problem.
 [2023-01-09 12:42 UTC] hayatsizyazilim at gmail dot com
The following patch has been added/updated:

Patch Name: sa
Revision:   1673268135
URL:        https://bugs.php.net/patch-display.php?bug=43148&patch=sa&revision=1673268135
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sun Nov 24 00:01:27 2024 UTC