php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #37268 basename() doesnt work with non-latin first letters of the file.
Submitted: 2006-05-01 23:56 UTC Modified: 2015-03-12 08:39 UTC
Votes:28
Avg. Score:4.1 ± 1.0
Reproduced:21 of 23 (91.3%)
Same Version:9 (42.9%)
Same OS:6 (28.6%)
From: spam dot bugs dot php dot net at vano dot org Assigned:
Status: Wont fix Package: Filesystem function related
PHP Version: 5.1.2 OS: Fedora Core 4
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: spam dot bugs dot php dot net at vano dot org
New email:
PHP Version: OS:

 

 [2006-05-01 23:56 UTC] spam dot bugs dot php dot net at vano dot org
Description:
------------
If a file starts with a non-latin letter (cyrillic) and not with a non-letter character, basename will cut off the beginning of name untill a latin letter or a non-letter character found in the name.
In my tests I was using Windows-1251 (CP1251) encoding and not UNICODE, so, the problem is in basename() function itself not in multi-byte charsets.


P.S. if you have problem with text encoding of the example below, you can see a realtime example and test your own inputs there:
http://examples.vano.org/basename.php

Reproduce code:
---------------
<?php
echo basename("/test/blah/music/???????latin.mp3");
?>

Expected result:
----------------
???????latin.mp3

Actual result:
--------------
latin.mp3

Patches

UTF8 (last revision 2011-05-18 08:18 UTC by vsz at ya dot ru)

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2006-05-02 00:55 UTC] judas dot iscariote at gmail dot com
you have to wait PHP6, it will address this problems.
 [2010-07-31 23:21 UTC] daniel at mameso dot com
a workaround:

# workaround for splitting basename whith beginning utf8 multibyte char
function mb_basename($filepath, $suffix = NULL) {
	$splited = preg_split ( '/\//', rtrim ( $filepath, '/ ' ) );
	return substr ( basename ( 'X' . $splited [count ( $splited ) - 1], 
$suffix ), 1 );
}

have fun,
Daniel.

PS: the problem does not exist under MAC OSX 10.6.x with ZendServerCE 5.0 & 
PHP5.3
 [2015-03-12 04:20 UTC] vovan-ve at yandex dot ru
Are you sure this problem will be solved? http://3v4l.org/BWgUm
 [2015-03-12 08:39 UTC] mike@php.net
You need to set the correct locale for stdlib/multibyte functions to work correctly; anyway, 3v4l.org apparently only has C/POSIX locale available:

http://3v4l.org/M6jhm
 [2016-03-19 09:29 UTC] lauri dot kentta at gmail dot com
The default configuration of php-fpm clears the environment, which leads to this problem. Would it be possible to use UTF-8 or even default_charset or at least accept any 8-bit characters if the locale is unset or C? It's a bit strange that the current behaviour can't be controlled through php.ini but instead must be set with setlocale or environment variables.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Mon Dec 02 23:01:29 2024 UTC