php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #67352 HTTP 1.1 causes timeouts for series of files
Submitted: 2014-05-28 02:59 UTC Modified: 2021-08-19 13:49 UTC
Votes:1
Avg. Score:1.0 ± 0.0
Reproduced:0 of 0 (0.0%)
From: Andy_Schmidt at HM-Software dot com Assigned: cmb (profile)
Status: Closed Package: Streams related
PHP Version: 5.4.28 OS: Windows 2012
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: Andy_Schmidt at HM-Software dot com
New email:
PHP Version: OS:

 

 [2014-05-28 02:59 UTC] Andy_Schmidt at HM-Software dot com
Description:
------------
The manual states in the first sentence: "Allows read-only access to files/resources via HTTP 1.0, using the HTTP GET method", however, the HTTP context options "method" and "protocol_version" clearly allow setting a HEAD method and version 1.1.

However, when a series of files is processed with a fsopen() "HEAD" request (to obtain file name, file type and timestamp, followed by a "GET" request, then for some web servers, the script will eventually start timing out for each file.

I don't know enough about the internals of PHP, but I wonder if the problem is that with each successive HTTP 1.1 request, another connection is opened - and then kept open (in line with protocol version 1.1) - preventing subsequent acesses until that previsouly used connection has timed out.

Just commenting out protocol_version => 1.1 will circumvent the problem.

Test script:
---------------
$arrValidTypes = array ( 'gif', 'jpg', 'jpeg', 'png', 'psd', 'bmp', 
	'tiff', 'tif', 'jpc', 'jp2', 'jpf', 'jb2', 'swc',
	'aiff', 'wbmp', 'xbm');

// Make HEAD request to get just metadata on submitted file
$arrRequestHeaders = array(
	'http'=>array(
		'method'		=>'HEAD',
		'protocol_version'	=>1.1,
		'follow_location'	=>1,
		'header'=>	"User-Agent: Anamera-Feed/1.0\r\n" .
	        	  	"Referer: $source\r\n" 
		)
	);
$rc = @fopen( $fURI, 'r', false, stream_context_create($arrRequestHeaders) );
		
if ( $http_response_header ) {
	for ( $i = sizeof($http_response_header) - 1; $i >= 0; $i-- ) {
		if ( preg_match('@^http/\S+ (\S{3,}) (.+)$@i', $http_response_header[$i], $http_status) > 0 ) {
			// HTTP Status header means we have reached beginning of response headers for last request
			break;
		}
		elseif ( preg_match('/^(\S+):\s*(.+)\s*$/', $http_response_header[$i], $arrHeader) > 0 ) {
			switch ( $arrHeader[1] ) {
				case 'Content-Type':
					if ( !isset($http_content_image_type) ) {
						if ( preg_match('@^image/(\w+)@ims', $arrHeader[2], $arrTokens) > 0 ) {
							if ( in_array(strtolower($arrTokens[1]), $arrValidTypes)) {
								$http_content_image_type = $arrTokens[1];
								break;
							}
						} 
						throw new Exception( "Error inspecting file: $fURI; invalid content type: $arrHeader[2]", 2);
					}
					break;
				case 'Content-Disposition':
					if ( !isset($http_content_filename) && preg_match('/filename\\s*=\\s*(?|"([^"]+)"|([\\S]+));?/ims', $arrHeader[2], $arrTokens) > 0 ) {
						$http_content_filename = basename($arrTokens[1]);
					}						
					break;
			}
		}
	}
}

// Generate target file name
$path_parts = pathinfo( $http_content_filename );
$fName = $path_parts['filename'];
if ( in_array(strtolower($path_parts['extension']), $arrValidTypes) ) {
	$fExt = $path_parts['extension'];

if ( !$fExt ) {
	// Default extension to Content-Type header, or 'jpg' as last resort
	$fExt = isset($http_content_image_type) ? $http_content_image_type : 'jpg';
}
		
$arrRequestHeaders = array(
	'http'=>array(
		'method'		=>'GET',
		'protocol_version'	=>1.1,
		'follow_location'	=>1,
		'header'=>	"User-Agent: Anamera-Feed/1.0\r\n" .
	        	  	"Referer: $source\r\n" 
			)
		);
$rc = copy( $fURI, $target_file, stream_context_create($arrRequestHeaders) );


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2014-05-28 04:22 UTC] rasmus@php.net
-Status: Open +Status: Feedback
 [2014-05-28 04:22 UTC] rasmus@php.net
I really don't think this is a PHP issue. You are most likely hitting the fact that HTTP/1.1 connections default to be persistent (keep-alive) so they are sitting there waiting for the keep-alive timeout. Send a "Connection: close" header to test that theory.
 [2014-05-28 06:46 UTC] aharvey@php.net
I believe this is a duplicate of request #65634 — it was implemented on PHP-5.6 and master, but I elected not to backport it due to the (slight) behavioural change.
 [2014-05-28 12:46 UTC] Andy_Schmidt at HM-Software dot com
-Status: Feedback +Status: Open
 [2014-05-28 12:46 UTC] Andy_Schmidt at HM-Software dot com
Agreed - sounds related to request #65634.
Reading your comments (and given a nod to my completely lack of understanding of the underpinnings in PHP), I wonder if this may also be a documentation "problem".

Two points:

a) The documentation http://www.php.net/manual/en/wrappers.http.php does seem misleading in that it implies that function is restricted to HTTP 1.0 GET. May I suggest a slight rewording:
"Allows access to files/resources via HTTP. By default, a HTTP 1.0 GET is used."

b) http://www.php.net/manual/en/context.http.php allows to set protocol_version 1.1, however, doesn't elaborate/warns, if that requires any special accomodation by the script programmer?

Reading your (much appreciated) feedback, I now wonder how HTTP 1.1 is implemented in the streams/HTTP contexts with respect to managing the open connection(s) and possible server connection limits? If a script makes 50 x 2 HTTP 1.1 requests against the SAME web server, each set being an fsopen() HEAD request with a new stream_context_create that (among other things) sets protocol_version 1.1, followed by another stream_context_create and copy() GET request with protocol_version 1.1 - how are the keep-alive connections handled?

If each fsopen()/copy() reuses the SAME open connection (as I had naively assumed), then there should be no "timeout" consideration - but it should save on TCP setup/breakdown times. Of course, I do (now) understand that I should send an explicit "Connction: Close" header with the final iteration of the loop, before proceeding on with a different web server.

But - after reading your comments I wonder, whether the HTTP 1.1 implementation in the HTTP contexts will simply open a NEW connection (just like HTTP 1.0 does) - but then leaves it open... never to be re-used. This would explain why eventually a connection limit implemented on the server might be reached, causing subsequent requests to have to wait for time-outs.

If the LATTER is the case, then I believe it requires appropriate mention in the documentaiton that protocol_version 1.1 must not be set, unless the script programmer manages the open connections themselves (an maybe link to relevant chapter in the documentation).

Per example - does it require that only ONE stream_context is created for that server because THAT resource actually holds the persistent HTTP 1.1 connection and that this ONE context be passed to all fsopen(), copy() etc. requests in order to actually reuse the SAME connection? In that case any unique headers (such as "if-modified-since") would have to be "modified" (rather than "created") in that existing stream_context by means of stream_context_set_option?

If THAT is the case, then the solution to #65634 actually does NOT seem appropriate to me. If the above is properly documented, then HTTP 1.1 implies KEEP-ALIVE - which probably would be the key reason why the script writer explicitly went throug all the effort to set protocol_version 1.1 in the first place. Negating that with a "Connection: Close" and forcing the script author to also specify "Connection: Keep-Alive" seems like a counter-intuitive twist. 

Rather than turning off persistent connections by default, it would seem more appropriate to document how a script author has to manage/reuse the persistent connection that HTTP 1.1 opens.
 [2021-08-19 13:49 UTC] cmb@php.net
-Status: Open +Status: Closed -Assigned To: +Assigned To: cmb
 [2021-08-19 13:49 UTC] cmb@php.net
> "Allows access to files/resources via HTTP. By default, a HTTP
> 1.0 GET is used."

Makes perfenct sense.  Thanks!  Committed[1].

Regarding the HTTP/1.1 behavior: Adam's fix was mainly to send a
Connection: close header with the request automatically.  If the
server heeds it, the situation is similar to HTTP/1.0.  If the
server doesn't, you may hit bug #80931.  Either way, there is
nothing that userland can do about that.

I think it's best to close this ticket; the general issue is still
tracked by the other bug report, and if it will be decided to
change that to doc problem, further enhancements/fixes to the docs
can still be done.

[1] <https://github.com/php/doc-en/commit/ebb0c383f4b954438eb72002ef4b84edd67d32ca>
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Nov 23 10:01:28 2024 UTC