php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #80384 filter buffers entire read until file closed
Submitted: 2020-11-20 04:14 UTC Modified: 2020-12-23 12:44 UTC
From: adamjseitz at gmail dot com Assigned: cmb (profile)
Status: Closed Package: Filter related
PHP Version: 7.4.12 OS: Debian Buster on WSL2
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: adamjseitz at gmail dot com
New email:
PHP Version: OS:

 

 [2020-11-20 04:14 UTC] adamjseitz at gmail dot com
Description:
------------
"zlib.inflate" filters appear to buffer all previously-read data, as exhibited by the attached code, until the file is closed.

It is unclear from my testing if the data being kept in memory is the original or inflated copy.

Test script:
---------------
<?php
// Generate "data.gz" file with:
// dd if=/dev/urandom of=data count=$((8*1024*1024)) iflag=count_bytes; gzip data

function PrintMem($message) { echo $message, ": ", (memory_get_usage() / 1024 / 1024) . " MB\n"; }

$fp = fopen("data.gz", 'rb');
$filter = stream_filter_append($fp, "zlib.inflate", STREAM_FILTER_READ, array( 'window' => 31 ));

PrintMem("before read");
stream_get_contents($fp);
PrintMem("after read");

stream_filter_remove($filter);
fclose($fp);
PrintMem("after close");
?>

Expected result:
----------------
I expect that very little memory is used.

It should be possible to read smaller blocks at a time from a very large gzipped files without buffering the entire content of what has been read so far, but that does not seem to be the case.

Actual result:
--------------
OUTPUT from the above script:

before read: 0.44554138183594 MB
after read: 8.4377517700195 MB
after close: 0.37459564208984 MB

Patches

Pull Requests

Pull requests:

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2020-11-20 15:05 UTC] cmb@php.net
-Summary: zlib.inflate filter buffers entire read until file closed +Summary: filter buffers entire read until file closed -Status: Open +Status: Verified -Package: Zlib related +Package: Filter related
 [2020-11-20 15:05 UTC] cmb@php.net
This is not particularly related to zlib.inflate, but rather a
general issue for streams which have any filter attached.
php_stream_fill_read_buffer()[1] only reads a single chunk if no
filter is attached, but the full size if a filter is attached.
Frankly, it's not clear to me why we're looping[2] inside that
function.

[1] <https://github.com/php/php-src/blob/php-7.4.12/main/streams/streams.c#L540>
[2] <https://github.com/php/php-src/blob/php-7.4.12/main/streams/streams.c#L552>
 [2020-11-21 22:19 UTC] adamjseitz at gmail dot com
The following pull request has been associated:

Patch Name: Fix #80384: limit read buffer size
On GitHub:  https://github.com/php/php-src/pull/6444
Patch:      https://github.com/php/php-src/pull/6444.patch
 [2020-11-21 23:04 UTC] adamjseitz at gmail dot com
Thanks for pointing me to the right code.

The loop appears to be necessary in the case that the filter chain consumes the full chunk, but yields no data. Some callers expect that _php_stream_fill_read_buffer will always add data to the read buffer in successful cases [1].

I added a PR to break out of the loop once stream->chunk_size has been read out of the filters.

[1] <https://github.com/php/php-src/blob/php-7.4.12/main/streams/streams.c#L982>
 [2020-12-23 12:44 UTC] cmb@php.net
-Assigned To: +Assigned To: cmb
 [2020-12-23 12:55 UTC] cmb@php.net
-Status: Verified +Status: Closed
 [2020-12-23 12:55 UTC] cmb@php.net
Automatic comment on behalf of adamjseitz@gmail.com
Revision: http://git.php.net/?p=php-src.git;a=commit;h=70dfbe00684eb1c31d5b49f643e4736696c3b7df
Log: Fix #80384: limit read buffer size
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 14:01:29 2024 UTC