php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #46456 zip_entry_name() mangling national characters
Submitted: 2008-11-01 17:31 UTC Modified: 2008-11-04 15:03 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: mkurpel at gmail dot com Assigned: pajoye (profile)
Status: Not a bug Package: Zip Related
PHP Version: 5.2.6 OS: FreeBSD 6.2-RELEASE-p8
Private report: No CVE-ID: None
 [2008-11-01 17:31 UTC] mkurpel at gmail dot com
Description:
------------
I am reading a ZIP file containing one XLS file named "??軾???????????????? ??ȫ?????????????????.xls". However, I am getting mangled characters from the zip_entry_name() function. I am working in UTF-8 so I utf_8_encoded it prior to echoing. All my php files are saved in utf-8 too.
Non-national characters are returned just fine.

Reproduce code:
---------------
header('Content-Type: text/html; charset: utf-8');
setlocale(LC_ALL, 'sk_SK.utf8');
mb_internal_encoding('UTF-8');
mb_http_output('UTF-8');

$zip = zip_open(realpath($somezipfile));
while ($zip_entry = @zip_read($zip))
{
	echo $path = utf8_encode(zip_entry_name($zip_entry));
}
@zip_close($zip);


Expected result:
----------------
should echo this string: ??軾???????????????? ??ȫ?????????????????.xls

Actual result:
--------------
echoed this string: –?Ÿœ?ì ¡‚£“å„¢Ø??’”‰ •æ¬›¦?µ???ÕŽà·Ò?‘™?š.xls

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2008-11-01 17:33 UTC] mkurpel at gmail dot com
The characters in Actual results got converted into html entities in this bug tracking system. The script has output actual characters, not entities.
 [2008-11-01 18:16 UTC] derick@php.net
First of all, don't use utf8_encode... does it work then?
 [2008-11-01 19:01 UTC] mkurpel at gmail dot com
No, it does not, it outputs some garbled characters :(
 [2008-11-03 08:37 UTC] jani@php.net
Ditch those mb_* calls  and remove that setlocale() call too. Then replace utf8_encode() with utf8_decode()..
 [2008-11-03 18:38 UTC] mkurpel at gmail dot com
Jani, tried it with no luck again, this was the output:
http://obrazok.eu/files/bg05v7lyfto1i883czzs.png
I have tried many more things, such as iconv and mb_convert_encoding but nothing helped. And I cannot replace those mangled characters by the desirable ones since zip_entry_name() returns the same mangled character for different national characters.
 [2008-11-03 18:56 UTC] pajoye@php.net
Please post a link to the zip archive you use for your tests as well as an image to show us how the text should be displayed (if it is using non ascii chars :).
 [2008-11-03 19:48 UTC] mkurpel at gmail dot com
You can find the zip file here:
http://kotuha.com/file/zWio7-aaa.html
There is a xls file inside with the filename ??軾???????????????? ??ȫ?????????????????.xls and it should display the xls filename exactly like it is (with national characters).
With the code originally provided it looks like this:
http://obrazok.eu/files/6hhmttlsmiwezc8gmo6m.png
All non-national characters (if there were any in the xls filename) are retained, national characters get garbled.
 [2008-11-04 13:08 UTC] pajoye@php.net
Here is a screen shot:

http://pierre.libgd.org/zip/46456.png

the console output uses the script below, the GUI is Winrar (same with Winzip or windows compressed folder).

<?php
$somezipfile = '46456.zip';
$zip = zip_open(realpath($somezipfile));
while ($zip_entry = zip_read($zip))
{
	echo zip_entry_name($zip_entry);
}
zip_close($zip);


There is no difference, no bug > bogus. As Derick said earlier, you are messing with the encoding and try to encode something that it is already UTF-8.



 [2008-11-04 14:18 UTC] mkurpel at gmail dot com
I tried the same script as pajoye sent here. Only added a second line with original filename for comparison:

<?php
$somezipfile = '46456.zip';
$zip = zip_open(realpath($somezipfile));
while ($zip_entry = zip_read($zip))
{
	echo zip_entry_name($zip_entry);
}
zip_close($zip);
?>
<br>
&#318;?&#269;&#357;???????&#328;??&#283;&#271;&#345;&#314;??? &#317;?&#268;&#356;???????&#327;??&#282;&#270;&#344;&#313;???.xls

Be sure to save it as ANSI (so we are not messing with utf-8 now). What it outputs is: http://obrazok.eu/files/79gro4t39dzzzr3mk6u7.png
The strings should be the same. I do not think what it outputs is okay.
 [2008-11-04 14:19 UTC] mkurpel at gmail dot com
My god, the string got converted to html entities again :-/
 [2008-11-04 14:21 UTC] pajoye@php.net
> Be sure to save it as ANSI (so we are not messing with utf-8 now).

Thing is Zip does not convert anything, it gives you what the zip entry contains. If the console or your html page are not using the same encoding, then it is likely to be displayed badly. 
 [2008-11-04 14:44 UTC] mkurpel at gmail dot com
Can you tell me what encoding does the zip format use? Maybe I could use iconv then... it must be possible somehow since the national chars are all okay when reopening the zip file in winrar.
 [2008-11-04 15:03 UTC] pajoye@php.net
Zip supports UTF-8 if some specific flag is set. But 99.999% of the time, the strings are stored as is, without any conversion (by many tools).

 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Apr 18 12:01:28 2024 UTC