php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #51929 Extracted a zip file that contains a folder named with Chinese characters
Submitted: 2010-05-27 13:36 UTC Modified: 2013-06-07 06:01 UTC
Votes:32
Avg. Score:4.9 ± 0.4
Reproduced:29 of 30 (96.7%)
Same Version:6 (20.7%)
Same OS:7 (24.1%)
From: kgo_yoi at hotmail dot com Assigned: pajoye (profile)
Status: No Feedback Package: Zip Related
PHP Version: 5.2.13 OS: Windows XP Simplified Chinese
Private report: No CVE-ID: None
 [2010-05-27 13:36 UTC] kgo_yoi at hotmail dot com
Description:
------------
Extracted a zip file contains a folder named with Chinese characters, the files in the folder is no longer in it.

e.g.
I have 2 zips, the one named en.zip contains a folder named English

en.zip
     |- English
           |- en.txt


the other named zh.zip contains a folder named 中文

zh.zip
     |- 中文
         |- zh.txt


Then I use ZipArchive to extract them,



Test script:
---------------
<?php
$zip = new ZipArchive;

$zip->open('en.zip');
$zip->extractTo('.');
$zip->close();

$zip->open('zh.zip');
$zip->extractTo('.');
$zip->close();

Expected result:
----------------
There are 2 folders, 'English' and '中文', each folder contains a txt file.

Actual result:
--------------
The 'English' folder contains a txt file named en.txt; the '中文' folder is empty and zh.txt is out of it.

Patches

fix_zip_unicode (last revision 2011-02-23 17:02 UTC by asaf at lingnu dot com)

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2011-02-18 13:58 UTC] g dot faust at tarent dot de
Like Bug #53948 (ZIP archive UTF-8 filenames problem) the problem seems to be the zip-implementation which (seems to) expects CP437 as valid encoding for file-/foldernames.

using 

$newFileOrFoldername = iconv('UTF-8', 'CP437//IGNORE//TRANSLIT', $oldFileOrFoldername);

as a Workaround, you have to look that iconv() supports CP437.
After this Workaround it works* with Win7(64bit) and Ubuntu 10.4(64bit) but the Archivemanager under Ubuntu 10.4(32bit) still shows wrong characters.

*) With the converted Filename where characters not-matching CP437 will be transliterated or even ignored - so it may be not a Workaround for Chinese characters ...
 [2011-02-18 14:04 UTC] pajoye@php.net
-Status: Open +Status: Duplicate
 [2011-02-18 14:04 UTC] pajoye@php.net
the library does nothing but to take the names you give as inputs.

The problem is also visible when a zip archive contains entries namend using UTF-8 
or other random encoding. These names are used to the filesystem APIs, which 
expect string endonded using the active run time CP.
 [2011-02-18 14:05 UTC] pajoye@php.net
-Status: Duplicate +Status: Assigned -Assigned To: +Assigned To: pajoye
 [2011-02-18 14:05 UTC] pajoye@php.net
At some point the library will support UTF-8 (as zip specs allow that). But it is 
not yet implemented.
 [2011-05-30 08:51 UTC] mikhail dot v dot gavrilov at gmail dot com
I also confirm this problem.

Workaround: Helps converting to your DOS encoding.
 [2011-08-24 05:36 UTC] chrodos at gmail dot com
I also confirm this bug in the Greek language. The problem is that it has severe implications in popular opensource applications like moodle. 

Please read related bug: http://tracker.moodle.org/browse/MDL-24928
 [2012-06-08 03:53 UTC] maxspeed40k at gmail dot com
windows 7 (32bit)
php 5.4
 [2012-06-11 12:11 UTC] mikhail dot v dot gavrilov at gmail dot com
I think this is not ZipArchive problem, it is problem zip archive format which not stored source code page, but extracting software interprets the code page in different ways depending on the platform (Windows uses DOS encoding ex. Cyrillic for 866, Linux uses UTF-8)
 [2012-07-24 17:09 UTC] gazambuja at gmail dot com
Some news about this bug?
Also affect extracting zip files with non-english filename encode.
 [2013-01-08 07:46 UTC] oburlaca at gmail dot com
Confirming that extracting archive with UTF8 filenames doesn't work
PHP 5.4.10
CentOS 6.3
 [2013-04-01 21:35 UTC] pajoye@php.net
Please try using master at https://github.com/pierrejoye/php_zip

UTF-8 support for the archive name entries have been added.
 [2013-04-01 21:35 UTC] pajoye@php.net
-Status: Assigned +Status: Feedback
 [2013-06-07 06:01 UTC] pajoye@php.net
-Status: Feedback +Status: No Feedback
 [2015-12-11 08:27 UTC] 350176528 at qq dot com
hi  pajoye, I've tried your code from https://github.com/pierrejoye/php_zip
then I run
phpize
./configure
make
make install
and then I got a new zip.so

but the problem still exists;

$zip->addFile('/some/path/file','中文名');
not works. and 

$zip->addFile('/some/path/file',iconv("UTF-8","CP437//IGNORE//TRANSLIT",'中文名'));
not works either.

my php version is 5.5.9 and os is ubuntu14.04
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sun Dec 22 05:01:30 2024 UTC