php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #68556 Fix for zlib.deflate and zlib.inflate filters; fix for example #1
Submitted: 2014-12-06 07:14 UTC Modified: 2016-01-21 00:35 UTC
Votes:4
Avg. Score:4.0 ± 1.7
Reproduced:2 of 3 (66.7%)
Same Version:0 (0.0%)
Same OS:1 (50.0%)
From: salsi at icosaedro dot it Assigned:
Status: Open Package: Streams related
PHP Version: Irrelevant OS:
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: salsi at icosaedro dot it
New email:
PHP Version: OS:

 

 [2014-12-06 07:14 UTC] salsi at icosaedro dot it
Description:
------------
Example #1 at http://php.net/manual/it/filters.compression.php is wrong, in fact it does not work, and the read file is not decoded at all. The reason is that the zlib.deflate filter actually generates a ZLIB Compress format (RFC 1950), NOT a DEFLATE format (RFC 1951), while zlib.inflate decompresses the DEFLATE, NOT the ZLIB Compress format; these two stream filters are not one the reverse of the other. So the situation is quite asymmetrical:

DEFLATE compressed format (RFC 1951):
gzdeflate() --> compress
gzinflate() and zlib.inflate --> decompress

ZLIB Compress format (RFC 1950: 2 bytes header + DEFLATE + ADLER32):
gzcompress() and zlib.deflate --> compress
gzuncompress() --> decompress

That's why in some users' comments one may find the strange recipe that states "zlib.inflate works on DEFLATE compressed data, but you have to strip away the first two bytes": those two bytes are in fact the ZLIB Compress header, as the DEFLATE compressed data are actually ZLIB Compress; the final ADLER32 hash gets ignored by the DEFLATE decompressor anyway as garbage.

I'm unsure if this is more a flawed API design than a doc issue.

Test script:
---------------
Example #1 with my fix:
<?php
$params = array('level' => 6, 'window' => 15, 'memory' => 9);

$original_text = "This is a test.\nThis is only a test.\nThis is not an important string.\n";
echo "The original text is " . strlen($original_text) . " characters long.\n";

$fp = fopen('test.deflated', 'w');
stream_filter_append($fp, 'zlib.deflate', STREAM_FILTER_WRITE, $params);
fwrite($fp, $original_text);
fclose($fp);

echo "The compressed file is " . filesize('test.deflated') . " bytes long.\n";
echo "The original text was:\n";

# WRONG: does not show the original message:
####/* Use readfile and zlib.inflate to decompress on the fly */
####readfile('php://filter/zlib.inflate/resource=test.deflated');

# FIX:
echo gzuncompress(file_get_contents("test.deflated"));

/* Generates output:
 *
 * The original text is 70 characters long.
 * The compressed file is 56 bytes long.
 * The original text was:
 * This is a test.
 * This is only a test.
 * This is not an important string.
 *
 */
?>



Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2016-01-19 12:39 UTC] salsi at icosaedro dot it
Once read alll the RFC 1950-1952, checked the zlib manual, read here and there the C source that implements zlib.deflate, and having experimented myself (see bug #71396 for a test program), I finally came to the conclusion that explains why the example#1 cannot work, and why that manual page needs a deep update:

http://php.net/manual/en/filters.compression.php

I'm not a doc contributor, but the following text might be a start:


zlib.deflate: what it really does and what the 'window' parameter really means
------------------------------------------------------------------------------

The zlib.deflate filters implements the compression methods DEFLATE, ZLIB and GZIP depending on the value of the 'window' parameter.

The 4 lower bits of the window parameter set the size of the internal “history buffer” used, being the base-2 logarithm of its size in a range from 8 up to 15. The meaning of the others bits of the window parameter can be set as described below.


- DEFLATE (RFC 1950) is a raw compression algorithm without header, without checksum. It is performed when the window parameter is set in the range from -8 up to -15. This compression algorithm is the base for all the formats generated by the zlib.deflate filter.
The corresponding functions that operates directly on string are gzdeflate() and gzdeflate().


- ZLIB (RFC 1951) applies the DEFLATE algorithm and adds a 2 bytes header and 4 bytes trailer containing the Adler32 checksum of the uncompressed data in little-endian byte order:

        ZLIB = ZLIBHEADER(2B)  DEFLATE  ADLER32(4B)

The 2 B header, read as 16 bit unsigned number in little-endian order, must be multiple of 31.
This format is generated when the window parameter is set in the range from 8 up to 15.
The corresponding functions that operates directly on string are gzcompress() and gzuncompress().


- GZIP (RFC 1952) applies the DEFLATE algorithm adding an header and a trailer with the CRC32 checksum of the uncompressed data and their length, both in little-endian byte ordering:

        GZIP = GZIPHEADER(10B)  DEFLATE  CRC32(4B)  LENGTH(4B)

This is the format of the .gz files, with the GZIPHEADER containing in the order the GZIP signature (“\x1f\x8B”), the compression method (“\x08”), 6 zero bytes, and the operating system set to the current system (“\x00” = Windows, “\x03” = Unix, etc.).
This format is generated when the window parameter is set in the range from 8+16=24 up to 15+16=31. Note that there is a limit of 4GB to the maximum length of the compressed data; beyond this limit, only the modulo 2^32 of the actual length is stored in the LENGHT part.
The corresponding functions that operates directly on string are gzencode() and gzdecode(); the gzopen() function allows to read and write .gz files.


By default window=-15, so generating a DEFLATE compressed stream with an internal history buffer of 2^15 B. Invalid window parameter may be detected and a quite obscure error can be generated:

       E_WARNING: unable to create or locate filter “zlib.deflate”


All this explains why the example #1 does not work and nothing can be read back. In fact using window=15 it generates ZLIB which the following readfile('php://filter/zlib.inflate/resource=test.deflated'); statement cannot read because it is expecting DEFLATE. The same example works setting the default window=-15.

The remaining big question is why the original example#1 does not work but no error is shown. But this is another story.
 [2016-01-21 00:35 UTC] salsi at icosaedro dot it
-Summary: zlib.deflate is not the reverse of zlib.inflate +Summary: Fix for zlib.deflate and zlib.inflate filters; fix for example #1
 [2016-01-21 00:35 UTC] salsi at icosaedro dot it
Completing description with the zlib.inflate filter:


zlib.inflate filter
-------------------

Only the 'window' parameter is allowed; any other parameter (memory, level) is ignored. Be $W the base-2 log of the history buffer size, so that 2^$W bytes are allocated by the decompressor. This value must be greater or equal to the value used for compression, because no attempt is made to reallocate. The range is 9 <= $W <= 15. If unknown, $W=15 is the safer choice.

- DEFLATE (RFC 1951): use window=-$W, with $W being a value not less than that used for compression. If unknown, set window=-15.

- ZLIB (RFC 1950): use window=$W. The value of $W is available from the same ZLIB header, otherwise set window=15.

- GZIP (RFC 1952): use window=$W+16. The value of $W is available from the same ZLIB header, otherwise set window=31.

- ZLIB or GZIP: use window=$W+32 for automatic header detection, so that both the formats can be recognized and decompressed; window=15+32=47 is the safer choice.


Error detection:

- If the window parameter is not one of those listed above, an error MAY be generated:

	E_WARNING: stream_filter_append(): Invalid parameter give for window size.

- If the $W value is less than that used for compression, an empty string or a short read MAY result, but no error is triggered nor exception is thrown. This last issue reveals when highly compressible data (say, a source program at least 40 KB long) that were compressed with large window (window=15+16), are then decompressed with a smaller window (window=9+16).

- On checksum fail, corrupted or incomplete data are returned, but no error is triggered nor exception is thrown.


(About the missing detection of decoding errors, see bug #71417.)
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 15:01:30 2024 UTC