php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #73877 readlink() returns garbage for UTF paths
Submitted: 2017-01-05 23:42 UTC Modified: 2017-01-07 23:01 UTC
From: kulakov74 at yandex dot ru Assigned: ab
Status: Closed Package: Filesystem function related
PHP Version: 7.1.0 OS: Windows 8.1
Private report: No CVE-ID:
 [2017-01-05 23:42 UTC] kulakov74 at yandex dot ru
Description:
------------
I work in CLI on Windows. 
After they finally 'Fixed UTF-8 and long path support on Windows' in PHP 7.1.0 I installed it and after changing default_charset to UTF-8 things seem to work, but when I call readlink() with Cyrillic paths it returns the same path cut somewhere near the end with a garbage character added. For example: 

"C:\Users\Серёжка" -> "C:\Users\Сер�"

readlink() is supposed to return the same path for regular directories, which it does for ones not having UTF characters. 

Test script:
---------------
$FullName="C:\Users\Серёжка";
echo(readlink($FullName));


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2017-01-06 19:51 UTC] ab@php.net
-Status: Open +Status: Feedback
 [2017-01-06 19:51 UTC] ab@php.net
Thanks for the report. Double quotes will lead to the backslash extrapolation, backslash need to be escaped therefore. Need also to know, what kind of link is that? Also - is the string passed UTF-8 encoded? Could you provide maybe an exact mklink command or alike, to get the reproducer? I've just tried with a junction and see no issue.

Thanks.
 [2017-01-06 20:24 UTC] kulakov74 at yandex dot ru
Yes, thank you, but in this particular case it doesn't matter - everything works the same way with "\\". 

And sorry, I didn't make it quite clear: in my case, the directory is NOT a link, so readlink() should return the dir, not a target, but it DOESN'T, so my script thinks it IS A LINK. So no mklink command had been used. 

I use
ini_set('default_charset', 'UTF-8'); 
and then I get the dir from opendir()/readdir(), so it should be UTF-8.
 [2017-01-06 20:34 UTC] kulakov74 at yandex dot ru
A follow-up: I tried the test script with various directories and the returned string always has the last directory cut about halfway, but at the same time the 1st half that is returned is correct (not garbage). So it looks like the problem is merely in allocating enough length for the result string.
 [2017-01-06 21:39 UTC] ab@php.net
-Status: Feedback +Status: Verified
 [2017-01-07 00:12 UTC] ab@php.net
Automatic comment on behalf of ab
Revision: http://git.php.net/?p=php-src.git;a=commit;h=0f410f8087f44f3ce347a207f90be64ce3f64d80
Log: Fixed bug #73877 readlink() returns garbage for UTF-8 paths
 [2017-01-07 00:12 UTC] ab@php.net
-Status: Verified +Status: Closed
 [2017-01-07 00:44 UTC] ab@php.net
-Assigned To: +Assigned To: ab
 [2017-01-07 00:44 UTC] ab@php.net
It'd rock if you could check with the latest snapshots, also involving some UNC path.

Thanks.
 [2017-01-07 20:25 UTC] kulakov74 at yandex dot ru
It works for all:
c:\Temp\Сережка
\\Pc3\абвгдеабвгдеабвгдеабвгде
 [2017-01-07 23:01 UTC] ab@php.net
Thanks very much for the checks!
 
PHP Copyright © 2001-2017 The PHP Group
All rights reserved.
Last updated: Sun Apr 30 18:01:35 2017 UTC