|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #42117 [PATCH] bzip2.compress loses data in internal buffer
Submitted: 2007-07-26 22:53 UTC Modified: 2007-08-09 23:27 UTC
From: phofstetter at sensational dot ch Assigned: iliaa (profile)
Status: Closed Package: Bzip2 Related
PHP Version: 5.2CVS-2007-08-05 OS: *
Private report: No CVE-ID: None
 [2007-07-26 22:53 UTC] phofstetter at sensational dot ch
When bzip2.compress is attached to a stream and enough data is created so the output will be larger than some internal buffer, all data in the last not totally full buffer seems to get lost on the way out.

The sample code contains quite a lot of filler text which is needed to actually fill up the internal buffer full enough to trigger the problem.

I always had this problem since the stream filters got introduced into PHP, but now I could finally create a very much reduced test case explaining the problem.

Reproduce code:
$str = "BEGIN (%d)\n
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.    
\nEND (%d)\n";

$h = fopen($_SERVER['argv'][1], 'w');
$f = stream_filter_append($h, "bzip2.compress", STREAM_FILTER_WRITE);
for($x=0; $x < 10000; $x++){
    fprintf($h, $str, $x, $x);

echo "Written\n";

Expected result:
If I call the script with

./script.php blah

and then use

bzcat blah

I expect the complete data output to the console.




END (9999)

Actual result:
bzcat outputs until somewhere around "END (9207)" and then bails out with

bzcat: Compressed file ends unexpectedly;
        perhaps it is corrupted?  *Possible* reason follows.
bzcat: Unknown error: 0
        Input file = blah, output file = (stdout)

It is possible that the compressed file(s) have become corrupted.
You can use the -tvv option to test integrity of such files.

You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.

and true enough - the file is not completely written.


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2007-07-27 09:21 UTC] phofstetter at sensational dot ch
after finally getting some sleep, today I looked at the problem in bz2_filter.c and I may have found something.

On line 229+ we have

	if (flags & PSFS_FLAG_FLUSH_CLOSE) {
		/* Spit it out! */
		status = BZ_OUTBUFF_FULL;
		while (status == BZ_OUTBUFF_FULL) {
			status = BZ2_bzCompress(&(data->strm), BZ_FINISH);
			if (data->strm.avail_out < data->outbuf_len) {
				size_t bucketlen = data->outbuf_len - data->strm.avail_out;

				bucket = php_stream_bucket_new(stream, estrndup(data->outbuf, bucketlen), bucketlen, 1, 0 TSRMLS_CC);
				php_stream_bucket_append(buckets_out, bucket TSRMLS_CC);
				data->strm.avail_out = data->outbuf_len;
				data->strm.next_out = data->outbuf;
				exit_status = PSFS_PASS_ON;

now the problem is IMHO that BZ2_bzCompress with BZ_FINISH will never return BZ_OUTBUFF_FULL. Looking at the documentation, it will return BZ_RUN_OK until all data has been processed when it will return BZ_FINISH_OK.

So with the code as it is currently in PHP, it will only do one run ob BZ2_bzCompress and then stop working even though more calls could be needed.

This is consistent with how the bug manifests itself.

I will try to correct the return code handling, but keep in mind that my C-skills are subpar, so the patch I'm going to post afterwards is probably not as good as it could be, so please have a look at the thing.

 [2007-07-27 10:06 UTC] phofstetter at sensational dot ch
looking at the documentation wasn't enough. When I looked at the source of bzlib, I found out this: 

BZ2_bzCompress called with BZ_FINISH keeps returning


(instead of BZ_RUN_OK which I assumed after reading the docs) until it's really done. Then it will return 


So the following patch fixes this bug:

--- bz2_filter.c.orig   2007-07-27 11:24:44.000000000 +0200
+++ bz2_filter.c        2007-07-27 11:54:35.000000000 +0200
@@ -228,8 +228,8 @@
        if (flags & PSFS_FLAG_FLUSH_CLOSE) {
                /* Spit it out! */
-               status = BZ_OUTBUFF_FULL;
-               while (status == BZ_OUTBUFF_FULL) {
+               status = BZ_FINISH_OK;
+               while (status == BZ_FINISH_OK) {
                        status = BZ2_bzCompress(&(data->strm), BZ_FINISH);
                        if (data->strm.avail_out < data->outbuf_len) {
                                size_t bucketlen = data->outbuf_len - data->strm.avail_out;

With this modification, the complete data gets written out to the stream.

Please consider applying this patch as without it, the bzip2.compress filter will sometimes (often - if the data is large enough to be bigger than the internal buffer) create corrupted data.

PS: The patch is against 5.2.2 as I'm unable to compile 5.2.3 on OSX with GD enabled due to gcc being called with an empty -L tag somewhere in configure.
 [2007-08-05 14:26 UTC] phofstetter at sensational dot ch

even with the latest snapshot, the bug is still there. Data in bzip's internal buffer is lost on stream close due to the problem I discovered in bz2_filter.c

The patch I proposed blow seems to do the right thing and in fact is now creating more than 1000 correct bzip2-streams per day, so I think it's save to say that it really does its job :-)

It's illegal to compare the return code of BZ2_bzCompress(&(data->strm), BZ_FINISH); with BZ_OUTBUFF_FULL as BZ2_bzCompress *never* returns BZ_OUTBUFF_FULL (which is a return value of one of the higher level convenience functions in bzlib.

 [2007-08-05 17:13 UTC]
See also bug #29521

 [2007-08-08 08:21 UTC]
Ilia, can you check this out please? See also bug #29521 which is also assigned to you. :)
 [2007-08-09 23:27 UTC]
This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
Thank you for the report, and for helping us make PHP better.

PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Jul 25 08:01:30 2024 UTC