php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #33350 php can't handle utf-8 in path names
Submitted: 2005-06-15 11:59 UTC Modified: 2016-08-08 09:52 UTC
Votes:6
Avg. Score:4.2 ± 0.9
Reproduced:4 of 4 (100.0%)
Same Version:1 (25.0%)
Same OS:4 (100.0%)
From: php-bug dot scyt at spamgourmet dot com Assigned: ab (profile)
Status: Closed Package: *General Issues
PHP Version: 5.0.4 OS: Windows *
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: php-bug dot scyt at spamgourmet dot com
New email:
PHP Version: OS:

 

 [2005-06-15 11:59 UTC] php-bug dot scyt at spamgourmet dot com
Description:
------------
Apache2 on Windows NT based operating systems uses utf-8 to encode pathnames. 

php fails with "file not found" error messages if it encounters such pathes. mod_php and php-cgi.exe is affected.

Reproduce code:
---------------
Rename a directory that contains a php script to "test?". That is an capital A with two dots above it. Codepoint 196 in latin-1. Then try to access this php script through the webserver.


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2005-06-15 13:44 UTC] edink@php.net
The underlying filesystem is not UTF-8 based so there is very little PHP can do here.
 [2005-06-15 14:42 UTC] php-bug dot scyt at spamgourmet dot com
Nice try. :)

It should do the same thing apache does. Convert the UTF-8 string to the system's character encoding. That is UTF-16 or the 8-bit character encoding windows is currently set to.

You could do the conversion to the 8-bit character encoding. That would save some code changes and would work in most cases. But there are systems where this would still fail. e.g. having directory names with german umlauts and setting the 8-bit character encoding to a character encoding without this umlauts (e.g. russian). The UTF-16 way would work in this scenario and is therefore preferable.
 [2013-09-12 17:50 UTC] bvibber at wikimedia dot org
So, this is still an issue that affects programs like MediaWiki running on Windows servers: https://bugzilla.wikimedia.org/show_bug.cgi?id=1780

Windows NT/XP/Vista/7/8/etc use Unicode for file names, but the POSIX filesystem APIs expose only the "ANSI" interfaces, which use 8-bit or DBCS locale-specific encodings.

Since PHP's filesystem APIs seem to use the POSIX ones, you can't use UTF-8 for filenames as one expects on Linux, Mac OS X, etc. Not only this, but there's no general way to simply switch encodings -- you'll be limited to the 8-bit encoding's character set, so can't for instance upload a Chinese-named file to a Russian-configured server.

Some other cross-platform toolkits like glib use Win32's Unicode filesystem APIs internally, which allow using the full Unicode character set for filenames, and expose the Unicode strings as UTF-8 for C null-terminated string compatibility.
 [2015-01-08 23:38 UTC] ajf@php.net
-Package: Feature/Change Request +Package: *General Issues -Operating System: Windows XP +Operating System: Windows *
 [2015-01-08 23:39 UTC] ajf@php.net
We'd be essentially fixing Microsoft's mistakes, but I think this should be fixed.
 [2016-08-08 09:52 UTC] ab@php.net
-Status: Open +Status: Closed -Assigned To: +Assigned To: ab
 [2016-08-08 09:52 UTC] ab@php.net
Fixed in PHP 7.1

Thanks.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Nov 23 10:01:28 2024 UTC