php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #73971 Filename got limited to MAX_PATH on Win32 when scan directory
Submitted: 2017-01-22 06:45 UTC Modified: 2017-02-13 07:40 UTC
From: ademybiz at gmail dot com Assigned: ab (profile)
Status: Closed Package: *Directory/Filesystem functions
PHP Version: 7.1.1 OS: Windows 10(10586)
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: ademybiz at gmail dot com
New email:
PHP Version: OS:

 

 [2017-01-22 06:45 UTC] ademybiz at gmail dot com
Description:
------------
It is awesome that PHP7.1 implemented Unicode file APIs on Windows as http://git.php.net/?p=php-src.git;a=commit;h=3d3f11ede4cc7c83d64cc5edaae7c29ce9c6986f.

There is likely one thing we left: PHP can create long file names but structs to store directory enumeration are still 256-byte as MAX_PATH defined.

It is natural when the filename is less than 256 ASCII glyph. But I can create multibyte filename up to 256 glyph while can not be accessed again because the actual length is over 256 bytes.

Function scandir() behaves similarly.

Test script:
---------------
<?php

$filename = str_repeat('テスト', 48); // 144 glyph here, less than 256
var_dump($filename); // 432 bytes here, more than 256

mkdir(__DIR__ . '\\' . $filename); // created correctly

$d = dir(__DIR__);
while (false !== ($entry = $d->read())) {
    var_dump($entry);
}
$d->close();

Expected result:
----------------
string(432) "テストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテスト"
string(1) "."
string(2) ".."
string(5) "1.php"
string(5) "1.txt"
string(432) "テストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテスト"

Actual result:
--------------
string(432) "テストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテスト"
string(1) "."
string(2) ".."
string(5) "1.php"
string(5) "1.txt"
string(256) "テストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテストテ" // truncated by 0xE5

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2017-01-22 06:47 UTC] ademybiz at gmail dot com
-Operating System: Windows +Operating System: Windows 10(10586)
 [2017-01-22 06:47 UTC] ademybiz at gmail dot com
PHP 7.1.1 (cli) (built: Jan 18 2017 18:38:28) ( NTS MSVC14 (Visual C++ 2015) x64 )
Copyright (c) 1997-2017 The PHP Group
Zend Engine v3.1.0, Copyright (c) 1998-2017 Zend Technologies
    with Zend OPcache v7.1.1, Copyright (c) 1999-2017, by Zend Technologies
 [2017-01-22 06:47 UTC] ademybiz at gmail dot com
-Summary: Filename get limited to MAX_PATH on Win32 when scan directory +Summary: Filename got limited to MAX_PATH on Win32 when scan directory
 [2017-01-22 06:47 UTC] ademybiz at gmail dot com
PHP 7.1.1 (cli) (built: Jan 18 2017 18:38:28) ( NTS MSVC14 (Visual C++ 2015) x64 )
Copyright (c) 1997-2017 The PHP Group
Zend Engine v3.1.0, Copyright (c) 1998-2017 Zend Technologies
    with Zend OPcache v7.1.1, Copyright (c) 1999-2017, by Zend Technologies
 [2017-01-22 22:17 UTC] ab@php.net
-Status: Open +Status: Verified -Assigned To: +Assigned To: ab
 [2017-01-22 22:17 UTC] ab@php.net
Thanks for the report. You're right, this particular case was overseen and the porting mistake happened :( Unfortunately, for 7.1 a fix might be not possible due to our bugfixing policies. The corresponding data structure declares "char d_name[_MAX_FNAME+1]" and is exported, so could be already used in external code and extensions. I pushed a simple fix to master, see 609507024f3e8da406d71aaabd04c1624c01ada6, changing the size to _MAX_FNAME*4+1, to be aware of UTF-8 conversion.

Clear, _MAX_FNAME with the wide API means teh number of wchar_t's, not bytes. So while the converison to UTF-8 happens, the buffer has to be quadrupled. For 5 byte encodings this will be still not enough, though. In any case, i think it makes sense to implement a POSIX compatible "struct dirent" in this part. Please see the notes about d_name member here http://man7.org/linux/man-pages/man3/readdir.3.html . For now, in master, I just set it to the multiple of 4, but indeed a refactoring is needed to have it dynamically allocated, so an arbitrary convertion can woork. It is suggested by the man page.

From what i can tell, however, and I guess it's the main reason why it was overseen (given this part had even no test before) - running the supplied code under the Linux distribution i use, an error "path is too long" is returned. From the linked man page one can also read, that on most systems the dir name is limited to literally 256 bytes. So from this POV - teh issue is crossplatform, while of course it's fully correct to say, that the OS possibilities can be exhausted if possible. For now a preliminary fix is in master, I'm going to target 7.2 with some more extensive refactoring. Changing this in 7.1 is an ABI breach, i'm currently investigating in how far it could make impact indeed. The safest is to not to change 7.1, as the actual long path functionality is not impacted.

Thanks.
 [2017-01-23 11:58 UTC] ademybiz at gmail dot com
Thanks for the quick reply. Expecting a robust solution at next major release.
 [2017-02-13 07:40 UTC] ab@php.net
-Status: Verified +Status: Closed
 [2017-02-13 07:40 UTC] ab@php.net
The fix for this bug has been committed.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.

 For Windows:

http://windows.php.net/snapshots/
 
Thank you for the report, and for helping us make PHP better.


 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 11:01:29 2024 UTC