php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #24083 Content-Encoding: gzip output includes the wrong Content-Length header
Submitted: 2003-06-08 15:24 UTC Modified: 2003-06-12 13:51 UTC
From: yngve at opera dot com Assigned:
Status: Not a bug Package: Zlib related
PHP Version: 4.3.1 OS:
Private report: No CVE-ID: None
 [2003-06-08 15:24 UTC] yngve at opera dot com
I am a developer with Opera Software, the developer of the Opera browser.

We have received several reports about problems with forums using VBulletin, but based on what I've learned I suspect that the problem is a general PHP problem, not Vbulletin specific.

The reason for the problem turns out to be the same, or of a similar nature, as my earlier bug report #12884.

A while ago I was in contact with one of the boards affected by the problem Rage3D.com. They reported back that they were able to solve the problem by turning off global compression

From the Rage3D people (with permission):
>The problem was caused because we were running the php "output_handler =
>ob_gzhandler" for global site compression.  vB didn't know to use the
>right content length header.  We've resolved the issue by turning off
>global compression and enabling gzip encoding through vBulletin.  We
>will be seeing if an updated gzip library for our server will resolve
>the issue so we can run global php compression in the future.


The following is a hexdump of the request and response headers from one of my tests (before Rage3D fixed the problem).

-----------------------------------
 
0000 : 47 45 54 20 2f 62 6f 61 72 64 2f 20 48 54 54 50   GET /board/ HTTP
0010 : 2f 31 2e 31 0d 0a 55 73 65 72 2d 41 67 65 6e 74   /1.1..User-Agent
0020 : 3a 20 4d 6f 7a 69 6c 6c 61 2f 34 2e 30 20 28 63   : Mozilla/4.0 (c
0030 : 6f 6d 70 61 74 69 62 6c 65 3b 20 4d 53 49 45 20   ompatible; MSIE
0040 : 36 2e 30 3b 20 57 69 6e 64 6f 77 73 20 4e 54 20   6.0; Windows NT
0050 : 35 2e 30 29 20 4f 70 65 72 61 20 37 2e 31 31 20   5.0) Opera 7.11
0060 : 20 5b 65 6e 5d 0d 0a 48 6f 73 74 3a 20 77 77 77   [en]..Host: www
0070 : 2e 72 61 67 65 33 64 2e 63 6f 6d 0d 0a 41 63 63   .rage3d.com..Acc
0080 : 65 70 74 3a 20 74 65 78 74 2f 68 74 6d 6c 2c 20   ept: text/html,
0090 : 69 6d 61 67 65 2f 70 6e 67 2c 20 69 6d 61 67 65   image/png, image
00a0 : 2f 6a 70 65 67 2c 20 69 6d 61 67 65 2f 67 69 66   /jpeg, image/gif
00b0 : 2c 20 69 6d 61 67 65 2f 78 2d 78 62 69 74 6d 61   , image/x-xbitma
00c0 : 70 2c 20 2a 2f 2a 3b 71 3d 30 2e 31 0d 0a 41 63   p, */*;q=0.1..Ac
00d0 : 63 65 70 74 2d 4c 61 6e 67 75 61 67 65 3a 20 65   cept-Language: e
00e0 : 6e 3b 71 3d 31 2e 30 2c 65 6e 3b 71 3d 30 2e 39   n;q=1.0,en;q=0.9
00f0 : 0d 0a 41 63 63 65 70 74 2d 43 68 61 72 73 65 74   ..Accept-Charset
0100 : 3a 20 77 69 6e 64 6f 77 73 2d 31 32 35 32 2c 20   : windows-1252,
0110 : 75 74 66 2d 38 2c 20 75 74 66 2d 31 36 2c 20 69   utf-8, utf-16, i
0120 : 73 6f 2d 38 38 35 39 2d 31 3b 71 3d 30 2e 36 2c   so-8859-1;q=0.6,
0130 : 20 2a 3b 71 3d 30 2e 31 0d 0a 41 63 63 65 70 74    *;q=0.1..Accept
0140 : 2d 45 6e 63 6f 64 69 6e 67 3a 20 64 65 66 6c 61   -Encoding: defla
0150 : 74 65 2c 20 67 7a 69 70 2c 20 78 2d 67 7a 69 70   te, gzip, x-gzip
0160 : 2c 20 69 64 65 6e 74 69 74 79 2c 20 2a 3b 71 3d   , identity, *;q=
0170 : 30 0d 0a 43 6f 6e 6e 65 63 74 69 6f 6e 3a 20 4b   0..Connection: K
0180 : 65 65 70 2d 41 6c 69 76 65 0d 0a 0d 0a            eep-Alive....
 
0000 : 48 54 54 50 2f 31 2e 31 20 32 30 30 20 4f 4b 0d   HTTP/1.1 200 OK.
0010 : 0a 53 65 72 76 65 72 3a 20 4d 69 63 72 6f 73 6f   .Server: Microso
0020 : 66 74 2d 49 49 53 2f 35 2e 30 0d 0a 44 61 74 65   ft-IIS/5.0..Date
0030 : 3a 20 54 68 75 2c 20 30 38 20 4d 61 79 20 32 30   : Thu, 08 May 20
0040 : 30 33 20 30 32 3a 32 30 3a 30 33 20 47 4d 54 0d   03 02:20:03 GMT.
0050 : 0a 43 6f 6e 6e 65 63 74 69 6f 6e 3a 20 63 6c 6f   .Connection: clo
0060 : 73 65 0d 0a 58 2d 50 6f 77 65 72 65 64 2d 42 79   se..X-Powered-By
0070 : 3a 20 50 48 50 2f 34 2e 33 2e 31 0d 0a 43 6f 6e   : PHP/4.3.1..Con
0080 : 74 65 6e 74 2d 74 79 70 65 3a 20 74 65 78 74 2f   tent-type: text/
0090 : 68 74 6d 6c 0d 0a 53 65 74 2d 43 6f 6f 6b 69 65   html..Set-Cookie
00a0 : 3a 20 73 65 73 73 69 6f 6e 68 61 73 68 3d 38 65   : sessionhash=8e
00b0 : 33 32 32 36 63 30 33 32 34 31 31 32 32 66 33 30   3226c03241122f30
00c0 : 31 38 64 64 33 64 64 62 38 61 38 63 37 61 3b 20   18dd3ddb8a8c7a;
00d0 : 70 61 74 68 3d 2f 0d 0a 53 65 74 2d 43 6f 6f 6b   path=/..Set-Cook
00e0 : 69 65 3a 20 62 62 6c 61 73 74 76 69 73 69 74 3d   ie: bblastvisit=
00f0 : 31 30 35 32 33 36 30 34 30 33 3b 20 65 78 70 69   1052360403; expi
0100 : 72 65 73 3d 46 72 69 2c 20 30 37 2d 4d 61 79 2d   res=Fri, 07-May-
0110 : 32 30 30 34 20 30 32 3a 32 30 3a 30 33 20 47 4d   2004 02:20:03 GM
0120 : 54 3b 20 70 61 74 68 3d 2f 0d 0a 43 6f 6e 74 65   T; path=/..Conte
0130 : 6e 74 2d 4c 65 6e 67 74 68 3a 20 36 39 31 37 33   nt-Length: 69173
0140 : 0d 0a 43 6f 6e 74 65 6e 74 2d 45 6e 63 6f 64 69   ..Content-Encodi
0150 : 6e 67 3a 20 67 7a 69 70 0d 0a 56 61 72 79 3a 20   ng: gzip..Vary:
0160 : 41 63 63 65 70 74 2d 45 6e 63 6f 64 69 6e 67 0d   Accept-Encoding.
0170 : 0a 0d 0a                                          ...
 
Followed by roughly 10000 bytes before the connection was shut down.

----------------------------------- 

My guess (as I am not familiar with PHP or vBulletin) is that vBulletin (as an example, I suspect this applies to all application using similar techniques) outputs content on this form 

----------------
Content-Length: 60000
 
<60000 bytes of data>
----------------
 
which is then handled by the ob_gzloadhandler with the result being
 
----------------
Content-Length: 60000
Content-Encoding: gzip
 
<10000 bytes that is the compressed version of 60000 bytes of data>
----------------
 
In which case, according to the Content-Length header, there are 50000 bytes missing from the response.

This is causing problems for Opera as in these cases Opera will assume there was a network/HTTP pipelining problem, and will try to recover by refetching the content one or more times, causing a "flashing" pageload as the page is redispalyed several times. The only exeption is when a form is sent, as in those cases the request cannot be resent, and Opera will instead display an error message "Connection closed by remote server".

The correct output would be something like:
 
----------------
Content-Length: 10000
Content-Encoding: gzip
 
<10000 bytes that is the compressed version of <60000 bytes of data> >
----------------

I suspect that the global compression handler does not remove the Content-Length header that was in the response it is processing. It should (must) remove that Content-Length header if it is present, and may optionally add its own corrected version of the header to the output, or may let the web server (e.g. Apache) handle the response in the way it handles responses with no specified length (which should either be "Connection: close" or "Transfer-Encoding: chunked"). The way this should be handled really depends on whether or not the compression handler has the entire response at the time it starts generating and sending data.

-- 
Sincerely,
Yngve N. Pettersen
?
********************************************************************
Senior Developer                     Email: yngve@opera.com
Opera Software ASA                   http://www.opera.com/
******************************************************************** 

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2003-06-09 08:30 UTC] sniper@php.net
Does this happen with PHP 4.3.2 ?

 [2003-06-09 09:01 UTC] yngve at opera dot com
I do not know if this is present in 4.3.2 as I am not using PHP myself, but I do know that the problem is present on servers using PHP 4.2.2 and 4.3.1, and your changelog does not indicate many changes related to compression, and none that I can see are related to this problem.
 [2003-06-09 22:09 UTC] sniper@php.net
We still need a short script to reproduce this,
anything I've tried works just fine..

 [2003-06-12 08:20 UTC] sniper@php.net
There's not much we can do if a script sets a wrong content-length header. Not PHP bug.

 [2003-06-12 10:43 UTC] yngve at opera dot com
As you are, apparently, not willing to enforce correct Content-Length headers I see no option but to make sure that Opera does not trust Content-Length headers served by PHP powered servers.

This will be effective as of v7.20.

Please inform me when you start enforcing RFC 2616 sec. 4.4 and only sends/passes through valid Content-Length headers, so that I can remove/limit this special handling of PHP servers.
 [2003-06-12 12:15 UTC] derick@php.net
Erm? 1st: it has nothing to do with "the PHP server", as PHP is not a server, but rather a module to a webserver. 2nd: we do not set any Content-Length header ourselves. 3rd: PHP is a flexiable scripting language allowed users to set (override) Content-Length headers themselves whereever they think it's necessary even although that might be wrong. AFAIK RFC 2606 is an HTTP RFC,  PHP does NOT implement HTTP, but rather makes use of it it. So, what is your problem with PHP?
 [2003-06-12 13:51 UTC] yngve at opera dot com
The PHP module may not set any Content-Length headers itself, but it is allowing the scripts to set them. I do not have any real problem with that (CGI scripts also do this). Any problems caused by incorrect content-length headers would be local to that particular script.

The problem is that when the PHP module apply *internal* postprocessing to the data generated by the script, such as is done by the ob_gzhandler (and possibly by other output handlers in PHP) the *length* of the output changes, and if the script sent an explicit content-length header, that particular header is no longer valid and should have been removed by your outputhandler. 

PHP is producing content (header + data) that is inteded to be transmitted over a HTTP channel, and is therefore IMO affected by the RFC 2616 requirements.

When PHP (or any other serverside module or proxy) modifies HTTP data (apart from the original construction of header + data when processing the script) it becomes PHP's responsibility (to the best of its ability) to ensure that the headers it forwards are still consistent with the new version of the data. That includes (in this case) not just adding a "Content-Encoding: gzip" header, but removing or editing the "Content-Length" header as well.

This is not really a question of the script setting a correct or incorrect header, but of the PHP postprocessing step allowing an obviously incorrect header value to be passed on to the client.

Because, as far as I can see, this is a general problem with PHP, and because PHP is used by a large number of sites, this Content-Length header problem is causing problems for many of Opera's users. I therefore have to take steps to reduce the actual visible signs of the problem (repeated "flashing" downloads of the page due to our HTTP pipeline error recovery heuristics, or "Connection closed by server" errors). 

In this case, as it does not appear you are willing to prevent the problem, the only two options open to me are to either disable "Accept-Encoding", or to not completely trust the Content-Length headers when communicating with servers that can be identified as using PHP. The first may increase network loads, unless the server uses Transfer-Encoding, the second has the advantages that compression may still be used, and if the problem is fixed later, Opera can still use persistent connections  to the server (which is anyway not possible as long as incorrect Content-Length headers are sent by the server).

As I already had code in place for the second option, due to similar problems with PHP 4.0.4-4.0.6, I chose to expand that code to cover all PHP versions. What this code do is to not send any new request to the server until Opera have received all the bytes of data it expects based on the content-length, even if the server claims to be a pipeline capable server, and if it does not recieve all the bytes before the connection closes assume that it really did get all the data, and not to "make a fuss" about the missing data.

I do not like adding special handling of specific serversoftware (broad definition) but in a number of cases it has been necessary in order to handle known problems of a general nature with that particular serversoftware that cannot be handled properly, or at all, by heuristics. As soon as the cause of the problem is removed I remove, or limit the scope of the special handling.
 
PHP Copyright © 2001-2020 The PHP Group
All rights reserved.
Last updated: Sat Sep 26 00:01:24 2020 UTC