|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #33350 php can't handle utf-8 in path names
Submitted: 2005-06-15 11:59 UTC Modified: 2016-08-08 09:52 UTC
Avg. Score:4.2 ± 0.9
Reproduced:4 of 4 (100.0%)
Same Version:1 (25.0%)
Same OS:4 (100.0%)
From: php-bug dot scyt at spamgourmet dot com Assigned: ab (profile)
Status: Closed Package: *General Issues
PHP Version: 5.0.4 OS: Windows *
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
Block user comment
Status: Assign to:
Bug Type:
From: php-bug dot scyt at spamgourmet dot com
New email:
PHP Version: OS:


 [2005-06-15 11:59 UTC] php-bug dot scyt at spamgourmet dot com
Apache2 on Windows NT based operating systems uses utf-8 to encode pathnames. 

php fails with "file not found" error messages if it encounters such pathes. mod_php and php-cgi.exe is affected.

Reproduce code:
Rename a directory that contains a php script to "test?". That is an capital A with two dots above it. Codepoint 196 in latin-1. Then try to access this php script through the webserver.


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2005-06-15 13:44 UTC]
The underlying filesystem is not UTF-8 based so there is very little PHP can do here.
 [2005-06-15 14:42 UTC] php-bug dot scyt at spamgourmet dot com
Nice try. :)

It should do the same thing apache does. Convert the UTF-8 string to the system's character encoding. That is UTF-16 or the 8-bit character encoding windows is currently set to.

You could do the conversion to the 8-bit character encoding. That would save some code changes and would work in most cases. But there are systems where this would still fail. e.g. having directory names with german umlauts and setting the 8-bit character encoding to a character encoding without this umlauts (e.g. russian). The UTF-16 way would work in this scenario and is therefore preferable.
 [2013-09-12 17:50 UTC] bvibber at wikimedia dot org
So, this is still an issue that affects programs like MediaWiki running on Windows servers:

Windows NT/XP/Vista/7/8/etc use Unicode for file names, but the POSIX filesystem APIs expose only the "ANSI" interfaces, which use 8-bit or DBCS locale-specific encodings.

Since PHP's filesystem APIs seem to use the POSIX ones, you can't use UTF-8 for filenames as one expects on Linux, Mac OS X, etc. Not only this, but there's no general way to simply switch encodings -- you'll be limited to the 8-bit encoding's character set, so can't for instance upload a Chinese-named file to a Russian-configured server.

Some other cross-platform toolkits like glib use Win32's Unicode filesystem APIs internally, which allow using the full Unicode character set for filenames, and expose the Unicode strings as UTF-8 for C null-terminated string compatibility.
 [2015-01-08 23:38 UTC]
-Package: Feature/Change Request +Package: *General Issues -Operating System: Windows XP +Operating System: Windows *
 [2015-01-08 23:39 UTC]
We'd be essentially fixing Microsoft's mistakes, but I think this should be fixed.
 [2016-08-08 09:52 UTC]
-Status: Open +Status: Closed -Assigned To: +Assigned To: ab
 [2016-08-08 09:52 UTC]
Fixed in PHP 7.1

PHP Copyright © 2001-2022 The PHP Group
All rights reserved.
Last updated: Mon Aug 08 10:05:45 2022 UTC