php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #80326 Phar may not properly extract PAX archives
Submitted: 2020-11-06 07:16 UTC Modified: 2021-01-27 18:32 UTC
From: corentin dot jacquemet at infomaniak dot com Assigned: cmb (profile)
Status: Closed Package: PHAR related
PHP Version: 7.4.12 OS: Debian10/Alpine
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If this is not your bug, you can add a comment by following this link.
If this is your bug, but you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: corentin dot jacquemet at infomaniak dot com
New email:
PHP Version: OS:

 

 [2020-11-06 07:16 UTC] corentin dot jacquemet at infomaniak dot com
Description:
------------
Wget the official mediawiki 1.35.0 from the official website, then try to uncompressed the archive using Phar.

The last archive of mediawiki-1.35.0 is not uncompressed correctly by Phar.
Reproduced on php 7.0.33, 7.1.33, 7.2.34, 7.3.24 and 7.4.12, on debian 10 and alpine (official php docker image)

The "widgets" file should be a directory, not a file.
If we use tar command, the archive is uncompressed correctly.
If we uncompressed using tar command, then compress it into a new archive, Phar is able to uncompressed correctly this new archive.


Test script:
---------------
<?php
// decompress from gz
$p = new PharData('mediawiki-1.35.0.tar.gz');
$p->decompress(); // creates xxx.tar

// unarchive from the tar
$phar = new PharData('mediawiki-1.35.0.tar');
$phar->extractTo('extracted_'.date('Y-m-d-H:i'));

Expected result:
----------------
$ ls -lah extracted_2020-11-06-06\:44/mediawiki-1.35.0/extensions/TemplateData/modules/ext.templateDataGenerator.editTemplatePage/widgets
total 28
drwxr-xr-x    2 root     root        4.0K Nov  6 06:48 .
drwxr-xr-x    3 root     root        4.0K Nov  6 06:48 ..
-rw-rw-r--    1 1000     1000        2.3K Sep 25 14:38 LanguageResultWidget.js
-rw-rw-r--    1 1000     1000        3.6K Sep 25 14:38 LanguageSearchWidget.js
-rw-rw-r--    1 1000     1000        1.2K Sep 25 14:38 ParamImportWidget.js
-rw-rw-r--    1 1000     1000        1.0K Sep 25 14:38 ParamSelectWidget.js
-rw-rw-r--    1 1000     1000        1.7K Sep 25 14:38 ParamWidget.js

$ ls -lhd extracted_2020-11-06-06\:44/mediawiki-1.35.0/extensions/TemplateData/modules/ext.templateDataGenerator.ed
itTemplatePage/widgets
drwxr-xr-x    2 root     root        4.0K Nov  6 06:48 mediawiki-1.35.0/extensions/TemplateData/modules/ext.templateDataGenerator.editTemplatePage/widgets

Actual result:
--------------
$ ls -lah extracted_2020-11-06-06\:44/mediawiki-1.35.0/extensions/TemplateData/modules/
ext.templateDataGenerator.editTemplatePage/widgets
-rw-rw-r--    1 root     root        1.7K Nov  6 06:44 extracted_2020-11-06-06:44/mediawiki-1.35.0/extensions/TemplateData/modules/ext.templateDataGenerator.editTemplatePage/widgets

$ ls -lhd extracted_2020-11-06-06\:44/mediawiki-1.35.0/extensions/TemplateData/modules/
ext.templateDataGenerator.editTemplatePage/widgets
-rw-rw-r--    1 root     root        1.7K Nov  6 06:44 extracted_2020-11-06-06:44/mediawiki-1.35.0/extensions/TemplateData/modules/ext.templateDataGenerator.editTemplatePage/widgets

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2020-11-06 11:18 UTC] requinix@php.net
WinRAR doesn't like that directory either. This might be an upstream problem.

Ubuntu tar is happy, but I do see that widgets/ didn't seem to be bundled with timestamp data. And in your output, the subdirectory files are owned by UID 1000 instead of root.
 [2020-11-06 13:46 UTC] corentin dot jacquemet at infomaniak dot com
using find command, all files are owned by 1000, but all directories are owned by root. Seems the same for timestamp data
 [2021-01-27 18:32 UTC] cmb@php.net
-Summary: Phar does not decompress properly +Summary: Phar may not properly extract PAX archives -Status: Open +Status: Verified -Type: Bug +Type: Documentation Problem -Assigned To: +Assigned To: cmb
 [2021-01-27 18:32 UTC] cmb@php.net
The point is that this tarball is stored in pax interchange
format, what is not really supported by Phar; it recognizes pax
headers, but ignores them[1].  This is usually not a real problem
since most of the pax meta data are "minor" details, but in this
case the meta data also contain the full filename, while the
actual filename field is only 100 bytes long.

Since the name of the directory which is unpacked as file is
exactly 100 bytes long (including the trailing /), the contained
files have the same name for Phar, and since the contained files
are stored after the directory in the tarball, that name is
recognized as file.

I don't think that anybody comes up with an improvement to fully
support pax format, especially since the ustar format prefix is
already supported by Phar (allowing for filenames up to 255 bytes
in the best case), and also the Gnu style ././@LongLink convention
is supported (allowing for arbitrary length filenames), I'm
changing this to documentation problem.

> WinRAR doesn't like that directory either.

Neither does 7-Zip.  Tarball builders may consider to stick with
the ustar format (e.g. for Gnu tar to explicitly pass
--format=ustar where necessary) for best portability.

By the way, extracting that archive was not possible on Windows,
since several filenames had a trailing dot…

[1] <https://github.com/php/php-src/blob/php-7.4.14/ext/phar/tar.c#L286-L290>
 [2021-01-28 12:01 UTC] cmb@php.net
Automatic comment on behalf of cmbecker69@gmx.de
Revision: http://git.php.net/?p=doc/en.git;a=commit;h=08a16a850e0dfdf1e37899a9d33126b0a333d864
Log: Fix #80326: Phar may not properly extract PAX archives
 [2021-01-28 12:01 UTC] cmb@php.net
-Status: Verified +Status: Closed
 [2021-01-28 15:49 UTC] mumumu@php.net
Automatic comment on behalf of mumumu@mumumu.org
Revision: http://git.php.net/?p=doc/ja.git;a=commit;h=c285e10c4f9aeaa743476fc3740286cd0eef2e4a
Log: Fix #80326: Phar may not properly extract PAX archives
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Apr 19 17:01:30 2024 UTC