php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #81488 ext/zip doesn't extract files with special names
Submitted: 2021-09-29 12:38 UTC Modified: 2021-10-04 10:40 UTC
From: cmb@php.net Assigned:
Status: Closed Package: Zip Related
PHP Version: 7.4Git-2021-09-29 (Git) OS: Windows
Private report: No CVE-ID: None
 [2021-09-29 12:38 UTC] cmb@php.net
Description:
------------
On Windows, ZipArchive::extractTo() fails to extract files which
contain characters which are not allowed for NTFS file systems,
namely <|>*?":, and also filenames with trailing dots.  Window's
built in extraction tool accepts a few of them (the exact
treatment is apparently version dependent), but if it works, the
special characters are replaced with an underscore.  7-zip replaces
all these characters with an underscore, and that actually appears
to be the desired behavior.

Note that files with a colon are actually extracted, but since a
colon marks an NTFS stream, it shows as filename with only the
leading part. This is undesireable, especially since PHP has only
partial support for NTFS streams (see bug #81339).

Test script:
---------------
<?php
$filenames = ["foo<bar1", "foo>bar2", "foo|bar3", "foo*bar4", "foo?bar5", "foo\"bar6", "foo:bar7", "foobar8."];
$zip = new ZipArchive();
$zip->open(__DIR__ . "/test.zip", ZipArchive::CREATE|ZipArchive::OVERWRITE);
foreach ($filenames as $filename) {
    $zip->addFromString($filename, "yada yada");
}
$zip->close();
mkdir(__DIR__ . "/extract");
$zip->open(__DIR__ . "/test.zip");
foreach ($filenames as $filename) {
    $zip->extractTo(__DIR__ . "/extract", $filename);
}
?>


Expected result:
----------------
Array
(
    [0] => .
    [1] => ..
    [2] => foo_bar1
    [3] => foo_bar2
    [4] => foo_bar3
    [5] => foo_bar4
    [6] => foo_bar5
    [7] => foo_bar6
    [8] => foo_bar7
    [9] => foobar8_
)


Actual result:
--------------
Array
(
    [0] => .
    [1] => ..
    [2] => foo
)


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2021-09-29 12:38 UTC] cmb@php.net
-Assigned To: +Assigned To: cmb
 [2021-09-29 12:55 UTC] remi@php.net
@cmb I'm a bit reluctant to fix this in the ext/zip side.

If this is a general issue on Windows, seems better to discuss this with libzip upstream and fix it there (if wanted)

And, BTW, what will happen with :
$filenames = ["foo<bar", "foo>bar", "foo|bar", "foo*bar", "foo?bar", "foo\"bar", "foo:bar", "foo_bar"];
 [2021-09-29 13:58 UTC] cmb@php.net
> If this is a general issue on Windows, seems better to discuss
> this with libzip upstream and fix it there (if wanted)

Hmm, that would also mean that the modified names are shown when
accessing them (e.g. ZipArchive::getNameIndex()).  Might actually
be better.  I filed <https://github.com/nih-at/libzip/issues/263>.

> And, BTW, what will happen with :
> $filenames = ["foo<bar", "foo>bar", "foo|bar", "foo*bar", "foo?bar", "foo\"bar", "foo:bar", "foo_bar"];

The last one wins.  But that already happens with e.g.

    $filenames = ["/foobar", "./foobar", "foobar"]
 [2021-10-04 10:40 UTC] cmb@php.net
-Status: Assigned +Status: Open -Type: Bug +Type: Documentation Problem -Assigned To: cmb +Assigned To:
 [2021-10-04 10:40 UTC] cmb@php.net
After some short discussion on the PR, I think it's better to
leave this as is, and to document the behavior.
 [2021-10-04 13:25 UTC] git@php.net
Automatic comment on behalf of cmb69
Revision: https://github.com/php/doc-en/commit/5e1a9062c0381edf0d70b172ffd81f668a53b6b6
Log: Fix #81488: ext/zip doesn't extract files with special names
 [2021-10-04 13:25 UTC] git@php.net
-Status: Open +Status: Closed
 [2021-10-08 09:13 UTC] git@php.net
Automatic comment on behalf of Girgias (author) and web-flow (committer)
Revision: https://github.com/php/doc-fr/commit/e1051e00d78ba63e64e8cea548ed7d256c8e59b9
Log: Apply 5e1a9062c0381edf0d70b172ffd81f668a53b6b6 (#108)
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Mar 29 15:01:28 2024 UTC