php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #37945 pathinfo() cannot handle argument with special characters like german "Umlaute"
Submitted: 2006-06-28 10:46 UTC Modified: 2006-07-17 20:36 UTC
From: andreas dot schmidt at stasy dot de Assigned: moriyoshi (profile)
Status: Closed Package: *Directory/Filesystem functions
PHP Version: 5.1.4 OS: Linux
Private report: No CVE-ID: None
 [2006-06-28 10:46 UTC] andreas dot schmidt at stasy dot de
Description:
------------
When pathinfo() is passed an argument which contains special characters like german Umlaute ??? or french ?, the return array contains wrong informations. In case of special characters in a directory name, the value for the key "basename" will be the name of the parent directory of this  directory.

The script was testet in UTF-8 and ISO-8859-1 with same results.

This bug occured at least with
a) a 'hand rolled' PHP 5.1.4 on Apache 1.3.36
b) PHP 5.1.2 on Apache 2.0.55 (Ubuntu 6.06 package)
c) PHP 5.0.3 on Apache 2.0.53 (SuSE Linux 9.3)

The problem does not occur when the script is run in a Win2k environment.

Furthermore it seems to be a PHP 5 bug as it did not occur when run with PHP4.

Reproduce code:
---------------
<?php
$dir = 'demo/testdir/????';
$pinfo = pathinfo($dir);
echo $pinfo['basename'];
?>

Expected result:
----------------
????

Actual result:
--------------
testdir

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2006-06-28 11:08 UTC] tony2001@php.net
Unicode support will appear only in PHP6, you have to wait until then.
 [2006-06-28 11:20 UTC] andreas dot schmidt at stasy dot de
This bug has nothing to do with Unicode!!!
The bug occurs when special characters like ???? are used. These characters are part of the ISO-8859-1 character set!
 [2006-06-29 15:25 UTC] mike@php.net
This, in deed, seems to be a regression introduced in PHP5.
http://cvs.php.net/viewvc.cgi/php-src/ext/standard/string.c?r1=1.405&r2=1.406&pathrev=PHP_5_2
 [2006-07-02 22:30 UTC] moriyoshi@php.net
Set the correct value to the LANG (or LC_CTYPE, if necessary) environment variables. The function expects your filesystem's  locale to be the same as the one given by the environment variable. Up to this point you have to set up the libc's locale data too.

If you were to use pathinfo() / dirname() / basename() on URI's, just don't do that. these are not designed to use for such a purpose.

 [2006-07-03 07:13 UTC] mike@php.net
There still seems to be a regression with utf8 and PHP-5:

mike@honeybadger:~$ php -r 'var_dump(pathinfo("/usr/bin/?ggi")); echo `printenv|egrep "LANG|LC"`;';
array(2) {
  ["dirname"]=>
  string(8) "/usr/bin"
  ["basename"]=>
  string(3) "ggi"
}
LANG=en_GB.UTF-8
LANGUAGE=en_GB:en

mike@honeybadger:~$ php -r 'setlocale(LC_ALL, getenv("LANG")); var_dump(pathinfo("/usr/bin/?ggi")); echo `printenv|egrep "LANG|LC"`;';
array(2) {
  ["dirname"]=>
  string(8) "/usr/bin"
  ["basename"]=>
  string(3) "ggi"
}
LANG=en_GB.UTF-8
LANGUAGE=en_GB:en

mike@honeybadger:~$ php -r 'var_dump(pathinfo("/usr/bin/????")); echo `printenv|egrep "LANG|LC"`;';
array(2) {
  ["dirname"]=>
  string(8) "/usr/bin"
  ["basename"]=>
  string(3) "bin"
}
LANG=en_GB.UTF-8
LANGUAGE=en_GB:en



Same on latin1 terminal:

s1-iw:~$ php -r 'var_dump(pathinfo("/usr/bin/?ggi")); echo `printenv|egrep "LANG|LC"`;';
array(2) {
  ["dirname"]=>
  string(8) "/usr/bin"
  ["basename"]=>
  string(4) "?ggi"
}
LC_ALL=en_US.iso-8859-1
LANG=en_US.iso-8859-1

s1-iw:~$ php -r 'setlocale(LC_ALL, getenv("LANG")); var_dump(pathinfo("/usr/bin/?ggi")); echo `printenv|egrep "LANG|LC"`;';
array(2) {
  ["dirname"]=>
  string(8) "/usr/bin"
  ["basename"]=>
  string(4) "?ggi"
}
LC_ALL=en_US.iso-8859-1
LANG=en_US.iso-8859-1

s1-iw:~$ php -r 'var_dump(pathinfo("/usr/bin/????")); echo `printenv|egrep "LANG|LC"`;';
array(2) {
  ["dirname"]=>
  string(8) "/usr/bin"
  ["basename"]=>
  string(4) "????"
}
LC_ALL=en_US.iso-8859-1
LANG=en_US.iso-8859-1


Everything also works well with PHP-4 in an utf8 console.
 [2006-07-17 20:36 UTC] mike@php.net
This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.


 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Mar 19 04:01:31 2024 UTC