php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #72258 ZipArchive converts filenames to unrecoverable form
Submitted: 2016-05-24 04:55 UTC Modified: 2016-07-19 09:59 UTC
From: speller at yandex dot ru Assigned: ab (profile)
Status: Closed Package: Zip Related
PHP Version: 5.6.21 OS: Windows 7
Private report: No CVE-ID: None
 [2016-05-24 04:55 UTC] speller at yandex dot ru
Description:
------------
I have a test zip archive with only one dir named (АБВГД) created under Windows with Russian locale. If you will look to the zip file in HEX mode you will see the dir name "80 81 82 83 84 2F" (АБВГД/), five 1-byte chars + slash. Under PHP 5.4 the ZipArchive::getNameIndex function onthis archive returned the name as is - in cp866 encoding. So I was able to convert it to correct text. After upgrading to PHP 5.6 this function returns names transcoded to something unrecoverable. The result of the ZipArchive::getNameIndex transcoded by PHP internally to five *2-bytes* chars + slash! I tried to set default_encoding ini setting, but without success. So I can not work properly with such archives at all.

Run test script under precompiled PHP 5.4 from the php.net and the same PHP 5.6 and you will see different results:




Test script:
---------------
<?
echo setlocale(LC_ALL, 'Russian_Russia.20866'), "\n";
ini_set('default_charset', 'cp866');

$zip = new \ZipArchive();
$res = $zip->open('test.zip');
if ($res !== true) {
	echo 'Error opening: ' . $res;
	die();
}


for ($i = 0; $i < $zip->numFiles; $i++) {
	$stat = $zip->statIndex($i);
	$fileNameInArc = $zip->getNameIndex($i);

	echo $fileNameInArc, "\n";
	echo iconv('cp866', 'utf-8', $fileNameInArc), "\n";
}


Expected result:
----------------
php.5.4.32\php test.php
Russian_Russia.20866
АБВГД/
╨Р╨С╨Т╨У╨Ф/


Actual result:
--------------
php.5.6.21\php test.php
Russian_Russia.20866
├З├╝├й├в├д/
тФЬ╨ЧтФЬтХЭтФЬ╨╣тФЬ╨▓тФЬ╨┤/


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2016-05-24 10:29 UTC] cmb@php.net
That might be related to bug #72200.
 [2016-05-24 11:49 UTC] speller at yandex dot ru
-Summary: ZipArchive converts filenames to unrecoverableform +Summary: ZipArchive converts filenames to unrecoverable form
 [2016-05-24 11:49 UTC] speller at yandex dot ru
The bug #72200 is about creating new archive and adding files to it. This issue is about reading archive. Also, I point your view to the fact that two versions of the official PHP binary produce *different* results.
 [2016-05-25 18:52 UTC] ab@php.net
-Status: Open +Status: Analyzed
 [2016-05-25 18:52 UTC] ab@php.net
Thanks for the report. The issue is about the discrepancy between the libzip versions used, not about PHP itself. The libzip moves evermore to UTF-8 making in default in various areas. The version used in PHP 5.6 explicitly requires the flags that were missed to be synchronized with the new libzip version.

@speller, a workaround for now:

$fileNameInArc = $zip->getNameIndex($i, 64);

Otherwise the filename will be converted to UTF-8 using the current ACP, which can yield quite weird results.

Thanks.
 [2016-05-25 18:52 UTC] ab@php.net
-Assigned To: +Assigned To: ab
 [2016-05-30 12:06 UTC] ab@php.net
-Type: Bug +Type: Documentation Problem
 [2016-05-30 12:06 UTC] ab@php.net
As of next 7.0, it'll be possible to use flag constants for filename encoding (see UPGRADING). In this case, ZipArchive::FL_ENC_RAW is useful. Thus, turning this into a doc issue.

Thanks.
 [2016-05-30 21:15 UTC] ab@php.net
-Assigned To: ab +Assigned To:
 [2016-07-19 09:59 UTC] ab@php.net
-Status: Analyzed +Status: Closed -Assigned To: +Assigned To: ab
 [2016-07-19 09:59 UTC] ab@php.net
The documentation is now online, thanks Remi.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 15:01:30 2024 UTC