php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #37738 basename does not work with Japanese
Submitted: 2006-06-08 06:09 UTC Modified: 2011-04-11 00:19 UTC
Votes:17
Avg. Score:4.6 ± 0.7
Reproduced:13 of 13 (100.0%)
Same Version:6 (46.2%)
Same OS:1 (7.7%)
From: joey at alegria dot co dot jp Assigned:
Status: Not a bug Package: *Directory/Filesystem functions
PHP Version: 5CVS-2006-06-08 (CVS) OS: Fedora Core 4
Private report: No CVE-ID: None
 [2006-06-08 06:09 UTC] joey at alegria dot co dot jp
Description:
------------
Simply put, basename() does ot work with Japanese filepaths. If the filename is Japanese only the extension part of the filename is returned. So a filename "/folder/?t?@?C????.txt" resolves to just ".txt". I discovered the problem when performing a basename() on the $_FILES array's 'name' element for uploaded Japanese files, however after testing the bug occurs no matter how you supply the filename.

My PHP environment is running with UTF-8 internal encoding.

The code snippet below illustrates this perfectly.

Reproduce code:
---------------
<?php
// show normal behavior with roman filename
$filename='/myfolder/roman_filename.txt';
echo "The full filename of the romanized file is $filename.\n"; // /myfolder/roman_filename.txt
$basename=basename($filename);
echo "The basename of the romanized file is $basename.\n"; // /roman_filename.txt
// show behavior with Japanese filename
$filename='/myfolder/???{???̃t?@?C????.txt';
echo "The full filename of the Japanese file is $filename.\n"; // /myfolder/???{???̃t?@?C????.txt
$basename=basename($filename);
echo "The basename of the Japanese file is $basename."; // .txt
?>

Expected result:
----------------
The full filename of the romanized file is /myfolder/roman_filename.txt.
The basename of the romanized file is roman_filename.txt.
The full filename of the Japanese file is /myfolder/???{???̃t?@?C????.txt.
The basename of the Japanese file is ???{???̃t?@?C????.txt.

Actual result:
--------------
The full filename of the romanized file is /myfolder/roman_filename.txt.
The basename of the romanized file is roman_filename.txt.
The full filename of the Japanese file is /myfolder/???{???̃t?@?C????.txt.
The basename of the Japanese file is .txt.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2006-06-08 06:37 UTC] derick@php.net
Won't fix in PHP 5. This will be implemented for PHP 6.
 [2011-04-10 22:23 UTC] chx@php.net
-Status: Wont fix +Status: Open
 [2011-04-10 22:23 UTC] chx@php.net
I am reopening this as PHP 6 is not developed any more and the bug is still valid.
 [2011-04-11 00:19 UTC] cataphract@php.net
-Status: Open +Status: Bogus
 [2011-04-11 00:19 UTC] cataphract@php.net
For better or worse, basename is affected by the locale. The encoding used in the filenames must match the set locale.

Bogus.
 [2012-06-06 20:46 UTC] ion at 66 dot ru
we want to use many locales and we don't know locale of filename before user add file. how we can use this filenames without transliteration?
 [2013-09-15 10:20 UTC] kenji dot uui at gmail dot com
On PHP 5.5.1 Windows, I got the results below if I put "setlocale(LC_ALL, 'japanese');".

The full filename of the romanized file is /myfolder/roman_filename.txt.
The basename of the romanized file is roman_filename.txt.
The full filename of the Japanese file is /myfolder/日本語のファイル名.txt.
The basename of the Japanese file is 日本語のファイル名.txt.
 [2013-10-30 15:10 UTC] fleshgrinder at gmx dot at
Just gave the following a test with PHP 5.5.5:

<?php

echo
  basename(__DIR__ . "/english.txt") , PHP_EOL ,
  basename(__DIR__ . "/日本語.txt") , PHP_EOL ,
  basename(__DIR__ . "/ελληνικά.txt") , PHP_EOL ,
  basename(__DIR__ . "/english.txt", "txt") , PHP_EOL ,
  basename(__DIR__ . "/日本語.txt", "txt") , PHP_EOL ,
  basename(__DIR__ . "/ελληνικά.txt", "txt") , PHP_EOL
;

?>

Everything was returned correctly by PHP, LC of the server was set to POSIX.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sun Dec 01 18:01:29 2024 UTC