php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #44164 Handle "Content-Length" HTTP header when zlib.output_compression active
Submitted: 2008-02-19 10:48 UTC Modified: 2013-08-01 19:36 UTC
Votes:13
Avg. Score:4.4 ± 0.6
Reproduced:12 of 12 (100.0%)
Same Version:3 (25.0%)
Same OS:5 (41.7%)
From: mplomer at gmx dot de Assigned: cataphract (profile)
Status: Closed Package: *General Issues
PHP Version: 5.2.5 OS: *
Private report: No CVE-ID: None
 [2008-02-19 10:48 UTC] mplomer at gmx dot de
Description:
------------
If you have a big project, where "Content-Length" header is set on many places, and then you turn on "zlib.output_compression", you will have a problem, because the original header is still passed through, but it is invalid now, because it contains the length of the uncompressed data instead of the compressed data.
This would confuse some HTTP clients, that rely on the "Content-Length".
I think, "zlib.output_compression" should be completely transparent, so this issue should be handled in php core.
One solution is, to remove the "Content-Length" header (if set) in ob_gzhandler() in ext/zlib/zlib.c:882 (where "Content-Encoding: gzip" header is set). If you see a good way to set the Content-Length to the compressed length, this would be, of course, the best solution, but this is IMHO only possible, if the entire data is buffered before (or is this done by ob_gzhandler anyway?).
This would make the various workarounds irrelevant that have to be done if "zlib.output_compression" is active and is completely backward compatible.

Reproduce code:
---------------
In php.ini:
zlib.output_compression = On

<?php

  $output = str_repeat('A', 8192);
  header('Content-Length: ' . strlen($output));

?>


Expected result:
----------------
Correct "Content-Length" header or removed "Content-Length" header in response, because the old length is always wrong. Better send no "Content-Length" instead a wrong length.

Actual result:
--------------
The old "Content-Length" header is sent.

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2008-02-19 11:04 UTC] mplomer at gmx dot de
Another idea is:
IF the Content-Length header IS set from the script,
THEN buffer the complete output and calculate the correct compressed Content-Length.
ELSE do nothing (do not add any Content-Length header)

So ob_flush(), flush() will still work if compression is active. (Which will not work, if we use apache's mod_deflate. This is why "zlib.output_compression" is required.)
 [2010-10-23 14:44 UTC] cataphract@php.net
-Package: Feature/Change Request +Package: *General Issues -Assigned To: +Assigned To: cataphract
 [2010-10-23 15:04 UTC] cataphract@php.net
I've just been hit by this problem and I'm thinking a better option would be to disable zlib.output_compression if the Content-Length header is set. What do you think?

Buffering everything and then calculating the actual length is not really an option, the output could be several hundred megabytes.
 [2010-10-23 16:49 UTC] mplomer at gmx dot de
In the meantime we do output-compression only via apache. But back to the problem:

First I thought it would be better to remove the header if present (buffering output is not an option of course), but it is also an idea to disable output compression completely, when this header is set (which is a rare case in our environment, and INHO most php users, too).

Our problem was that Varnish Reverse Proxy was confused by this inconsistent headers. Both solutions would solve this.
 [2010-10-26 04:16 UTC] cataphract@php.net
Automatic comment from SVN on behalf of cataphract
Revision: http://svn.php.net/viewvc/?view=revision&amp;revision=304903
Log: - Implemented request #44164, zlib.output_compression is now implicitly
  disabled when the header &quot;Content-length&quot; is set.
#One could argue that any output handler could change the size of the
#response, so this exception for zlib.output_compression is an
#inconsistency. However, zlib.output_compression is presented as a
#performance setting, whose value should have no effect on the
#correctness of the scripts. This was not the case. Setting the
#header &quot;content-length&quot; and enabling zlib.output_compression was
#a recipe for infringing section 4.4 of RFC 2616.
 [2010-10-26 04:16 UTC] cataphract@php.net
-Status: Assigned +Status: Closed
 [2010-10-30 18:32 UTC] cataphract@php.net
-Status: Closed +Status: Assigned
 [2010-10-30 18:32 UTC] cataphract@php.net
I'm reopening as the general opinion seems to be that the content-length header should be suppressed instead of disabling compression.
 [2010-11-18 05:09 UTC] cataphract@php.net
Automatic comment from SVN on behalf of cataphract
Revision: http://svn.php.net/viewvc/?view=revision&amp;revision=305481
Log: - Reversed implementation of FR #44164, pending further consideration.
  See rev #304903.
 [2010-12-13 17:45 UTC] panczel dot levente at groware dot hu
Our projects make heavy use of Content-Length. Disabling it unnecessarily is costly on networks with large RTT.
I also do not agree that manipulating Content-Length is a bad thing to do for output handlers. To give a correct Content-Length (whenever possible) is the task of the handler just as setting the "Content-Encoding: gzip" is.
I'd vote for a solution where zlib output generates a correct Content-Length whenever it has the opportunity (regarding the current settings). The most straightforward solution I can imagine is that the output compression module waits until the first buffer flush and then right before writing to its output it checks whether the input has finished [i.e. the whole page is buffered] and that the compressed+encoded length is known; then it sets the correct Content-Length. If, but _only_ if any of the above requirements are not met (input still pending, compressed size is unknown for compression cannot complete without flushing first) then clear Content-Length and flush (so it cannot be set anymore).
I think this would maintain correctness, does not need additional resources (like extra buffering), but keeps the benefits of sending Content-Length whenever possible. (This last one I find to be a huge benefit with pages that include many generated CSS-parts or for pages that dynamically load many files, like dojo.)
 [2010-12-14 04:57 UTC] cataphract@php.net
-Summary: Handle "Content-Length" HTTP header when zlib.output_compression active +Summary: Crash when using unserialized DatePeriod instance -Type: Feature/Change Request +Type: Bug -Package: *General Issues +Package: Date/time related -Operating System: * +Operating System: Windows XP SP3 -PHP Version: 5.2.5 +PHP Version: 5.3.3
 [2010-12-14 04:57 UTC] cataphract@php.net
> Our projects make heavy use of Content-Length. Disabling it unnecessarily is
> costly on networks with large RTT.

The problem is not the existence of a Content-length header, it's the fact that you're setting a content-length header indicating a size you cannot possibly know. A wrong Content-length header is worse than none.

Apache already adds a Content-length header when it can (i.e. for small responses), it's not necessary PHP does this; sending it on every compressed response is unpractical because it would require buffering the entire response. If you need this, I suppose you can always explicitly start the zlib output handler and call ob_get_length.
 [2010-12-14 04:58 UTC] cataphract@php.net
-Summary: Crash when using unserialized DatePeriod instance +Summary: Handle "Content-Length" HTTP header when zlib.output_compression active -Package: Date/time related +Package: *General Issues -Operating System: Windows XP SP3 +Operating System: * -PHP Version: 5.3.3 +PHP Version: 5.2.5
 [2010-12-14 04:58 UTC] cataphract@php.net
> Our projects make heavy use of Content-Length. Disabling it unnecessarily is
> costly on networks with large RTT.

The problem is not the existence of a Content-length header, it's the fact that you're setting a content-length header indicating a size you cannot possibly know. A wrong Content-length header is worse than none.

Apache already adds a Content-length header when it can (i.e. for small responses), it's not necessary PHP does this; sending it on every compressed response is unpractical because it would require buffering the entire response. If you need this, I suppose you can always explicitly start the zlib output handler and call ob_get_length.
 [2010-12-14 04:59 UTC] cataphract@php.net
Sorry for the mess; I was betrayed by the browser's autocomplete.
 [2010-12-15 01:25 UTC] panczel dot levente at groware dot hu
Sorry for not being clear enough, let me explain! To put things simple I’ll use two examples: [A] the one above with the 8K ‘A’ characters and the following [B]:
<?php header(‘Content-Length: 0’); ?>

> The problem is not the existence of a Content-length header
I never wrote that its existence would be a problem. On the contrary: I think its correct presence is desirable wherever possible (most possible requests and most possible layers of the runtime environment).

> it's the fact that you're setting a content-length header indicating a size you cannot possibly know
That’s an error. Both scripts set the correct CL (that they know very well), just the way the specification says they SHOULD. I don’t agree that it would be the responsibility of the script to counteract the setting (zlib output compression in this case) of the executing framework (PHP in this case). If the scripts should take care for every such situation then using the header() would be completely illegal, because a future output handler might interact with the output in such a way that invalidates the headers set. This isn’t a portable phylosophy since it implicitly requires the script being aware of every aspects of plugins and settings in PHP.
In fact it is the zlib output handler that was setting the wrong CL header (by not removing the deprecated one). As I see, the handler is constructing a new response entity instead the one it receives from the script; the consistency of this response is entirely the responsibility of the handler. As I understand this has now been patched so that the handler always removes the CL header, and by that it assures correctness. Note: here’s no refutation of the correctness of the patched handler.

> Apache already adds a Content-length header when it can (i.e. for small responses), it's not necessary PHP does this
Didn’t mean to suggest it would be necessary. It just yields better performance (if the cost of generating the CL is not high).

> sending it on every compressed response is unpractical because it would require buffering the entire response
Not for every compressed response; that would be impossible e.g. for live streams. But on the other hand ALWAYS discarding CL is the worst one among the correct solutions. Consider example [B]: I imagine that the script has already finished once the handler receives control, thus it is able to see that its input (from the script) is already closed. In this case it does not have to use buffers or make heavy computations: by skipping compression entirely everything is set, and a correct CL is transmitted. I’ll get back to this.

> I suppose you can always
Yes, one can always make patches to avoid specific errors that a buggy RTE produces. I just hope there’s no software engineer who sees this as a reason against fixing a bug.

Now back to example [B]. I see it’s not a common use case, but I think it sheds light on other problems too. Let’s distinguish administrators, who control webserver (or other environment) and PHP settings but must not edit application code, and software designers who have to create a versatile PHP application that can be run on any platform efficiently without having influence on the specific settings of the platform. So developer doesn’t say “please turn compression off” and admin doesn’t add some new lines of code to the script. Let’s assume the zlib handler has a small buffer (probably the one it already has and is configurable with the value of zlib.output_compression). The handler initially fills compressed output into this buffer. If it has to flush the buffer before input EOF then it clears CL flushes and replaces itself with the compressor-component in the stream-chain (or does any other thing it does now to compress the response body). Otherwise it computes and sets the correct CL and sends the compressed body of the response.
In this manner correct applications can be written that have the benefit of using CL without having to care for whether zlib is enabled; software designers can rest assured that their code is good and runs efficient. Admin can switch zlib on/off as he sees fit: he will neither break the served apps, nor cripple their performance. And even better: when admin sees that 1% of the responses is <4K (the default zlib buffer size) 98% is between 4K and 20K and only 1% is >20K, he can just go “Why wouldn’t I sacrifice that 16K/request RAM to have Content-Length almost always sent to the client in contract to the current habit of almost never sending?!” … and how right he would be. As you can see this solution is not only bright for the 0-long [B], not only to the 8K-long [A] but possibly for any environment, since sticking with the defaults gives a good tradeoff while maintenance personnel has the opportunity to fine-tune this behavior without adding modifications or posing constraint to the code.
The answer showed that my previous post wasn’t verbose enough to express my opinion: striving towards such quality solutions as sketched in the last part _might_ be a better option than choosing the simplest solution (as the current one is). And I’m pretty sure that the ones who wrote the zlib handler can think of solutions that are both more elegant and more efficient and provide the Web with as many correct CL headers as possible.
 [2010-12-17 16:35 UTC] cataphract@php.net
> That’s an error. Both scripts set the correct CL (that they know very well),
> just the way the specification says they SHOULD. I don’t agree that it would
> be the responsibility of the script to counteract the setting (zlib output
> compression in this case) of the executing framework (PHP in this case). If
> the scripts should take care for every such situation then using the header()
> would be completely illegal, because a future output handler might interact
> with the output in such a way that invalidates the headers set. This isn’t a
> portable phylosophy since it implicitly requires the script being aware of
> every aspects of plugins and settings in PHP.
> In fact it is the zlib output handler that was setting the wrong CL header (by
> not removing the deprecated one). As I see, the handler is constructing a new
> response entity instead the one it receives from the script; the consistency of
> this response is entirely the responsibility of the handler. As I understand
> this has now been patched so that the handler always removes the CL header, and
> by that it assures correctness. Note: here’s no refutation of the correctness
> of the patched handler.

The problem is the zlib.output_compression is not presented as an output handler that rewrites the response and creates a new entity. It is presented as an inoffensive performance option that compresses the output for better performance. And it does so, generally, without the express assent of the programmer. The programmer can always use ob_gzhandler to force compression.

Your thesis is that the output handler should not be deactivated; instead it ought to remove the old header and write a new one, whenever possible. This looks good. But consider this script:

if (empty($_SERVER["HTTP_RANGE"])) {
    $offset = 0;
}
else { //violates rfc2616, which demands ignoring the header if invalid
    preg_match("/^bytes=(\d+)-/i",$_SERVER["HTTP_RANGE"], $matches);
    if (empty($matches[1]))
        $offset = 0;
    if (is_num_int($matches[1]) && $matches[1] < $filesize && $matches[1]>=0) {
        $offset = $matches[1];
        if (@fseek($fp,$offset,SEEK_SET) != 0)
            InternalError();
        header("HTTP/1.1 206 Partial Content");
        header("Content-Range: bytes $offset-".($filesize - 1)."/$filesize");
    }
    elseif ($matches[1] > $filesize) {
        header("HTTP/1.1 416 Requested Range Not Satisfiable");
        die();
    }
    else $offset = 0;
}
$conlen = $filesize - $offset;

header("Content-Length: $conlen");

This is no way this script can work correctly under the zlib handler. 206 responses must have a content-length and the offsets are calculated through the uncompressed size, while under zlib that should be calculated under the compressed size, which is obviously impossible to know without first compressing the file.

So actually the only option is to disable the zlib output handler.
 [2010-12-17 19:18 UTC] panczel dot levente at groware dot hu
Thanks, you are absolutely right pointing at my error: my suggestion would not work in situations where a Content-Length header was mandatory or referenced uncompressed body length. The partial response 206, as I understand, doesn’t make Content-Length mandatory. In fact the last line might be omitted from your example and that is still a valid response. But since Content-Length is not mandatory in this case either, I think my thesis still works.

I have not found any explicit remarks in the specification on how offsets and Content-Encoding should interact. As I see now all fields are about the document-entity (the one that the script handles and knows well) except for Content-Encoding and Content-Length fields which are about the representation of the message body. So Content-Length always shows the decimal number of octets transferred in the message body’s final byte-stream, and Content-Encoding has to be reversed before other processing (like matching it to the requested range’s size) takes place.
For all response types where Content-Length is mandatory, I agree with you, that compression should be turned off (possibly after trying to fit in the initial 1 buffer that I think is allocated anyways). But we know that in case of response 200 it is not mandatory, and as I see, for 206 neither. So at least these responses could follow my thesis (and any others currently do not require a Content-Length field).

> The problem is the zlib.output_compression is not presented as an output handler that rewrites the response and creates a new entity. It is presented as an inoffensive performance option that compresses the output for better performance.
Yes, it rewrites the response; but no, it does not create a new entity. I think that’s just what http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.11 references by “without losing the identity of its underlying media type”. So to send a compressed body, one just has to adjust the Content-Encoding field and take care that Content-Length is not invalid. I feel that changing these headers isn’t more intrusive than altering body octets, since they do not affect other content and headers in the message, except for Transfer-Encoding which I suppose that zlib compression correctly adjusts to. I think chunked Transfer-Encoding is relevant for two reasons. If received from the script, it has to be assembled before compression. And it might be used to maintain persistent connections (e.g. 1 compressed buffer in each chunk) where compression was not able to tell the Content-Length in advance.

Please understand that I’m not pushing for any of these features, just think that this topic still has potential for inspiring improvement and finding rare bugs.
 [2012-02-16 10:00 UTC] daniel at code-emitter dot com
FYI: This issue is still causing problems.
http://tracker.phpbb.com/browse/PHPBB3-10648
 [2013-08-01 14:19 UTC] mike@php.net
Why is this open, despite the patch being still applied to SAPI.c?
 [2013-08-01 19:36 UTC] cataphract@php.net
-Status: Assigned +Status: Closed
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Apr 19 03:01:27 2024 UTC