|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #73877 readlink() returns garbage for UTF paths
Submitted: 2017-01-05 23:42 UTC Modified: 2017-01-07 23:01 UTC
From: kulakov74 at yandex dot ru Assigned: ab (profile)
Status: Closed Package: Filesystem function related
PHP Version: 7.1.0 OS: Windows 8.1
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If this is not your bug, you can add a comment by following this link.
If this is your bug, but you forgot your password, you can retrieve your password here.
Bug Type:
From: kulakov74 at yandex dot ru
New email:
PHP Version: OS:


 [2017-01-05 23:42 UTC] kulakov74 at yandex dot ru
I work in CLI on Windows. 
After they finally 'Fixed UTF-8 and long path support on Windows' in PHP 7.1.0 I installed it and after changing default_charset to UTF-8 things seem to work, but when I call readlink() with Cyrillic paths it returns the same path cut somewhere near the end with a garbage character added. For example: 

"C:\Users\Серёжка" -> "C:\Users\Сер�"

readlink() is supposed to return the same path for regular directories, which it does for ones not having UTF characters. 

Test script:


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2017-01-06 19:51 UTC]
-Status: Open +Status: Feedback
 [2017-01-06 19:51 UTC]
Thanks for the report. Double quotes will lead to the backslash extrapolation, backslash need to be escaped therefore. Need also to know, what kind of link is that? Also - is the string passed UTF-8 encoded? Could you provide maybe an exact mklink command or alike, to get the reproducer? I've just tried with a junction and see no issue.

 [2017-01-06 20:24 UTC] kulakov74 at yandex dot ru
Yes, thank you, but in this particular case it doesn't matter - everything works the same way with "\\". 

And sorry, I didn't make it quite clear: in my case, the directory is NOT a link, so readlink() should return the dir, not a target, but it DOESN'T, so my script thinks it IS A LINK. So no mklink command had been used. 

I use
ini_set('default_charset', 'UTF-8'); 
and then I get the dir from opendir()/readdir(), so it should be UTF-8.
 [2017-01-06 20:34 UTC] kulakov74 at yandex dot ru
A follow-up: I tried the test script with various directories and the returned string always has the last directory cut about halfway, but at the same time the 1st half that is returned is correct (not garbage). So it looks like the problem is merely in allocating enough length for the result string.
 [2017-01-06 21:39 UTC]
-Status: Feedback +Status: Verified
 [2017-01-07 00:12 UTC]
Automatic comment on behalf of ab
Log: Fixed bug #73877 readlink() returns garbage for UTF-8 paths
 [2017-01-07 00:12 UTC]
-Status: Verified +Status: Closed
 [2017-01-07 00:44 UTC]
-Assigned To: +Assigned To: ab
 [2017-01-07 00:44 UTC]
It'd rock if you could check with the latest snapshots, also involving some UNC path.

 [2017-01-07 20:25 UTC] kulakov74 at yandex dot ru
It works for all:
 [2017-01-07 23:01 UTC]
Thanks very much for the checks!
PHP Copyright © 2001-2022 The PHP Group
All rights reserved.
Last updated: Thu Oct 06 11:03:41 2022 UTC