php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #75584 Docs?: stream_select() ignores proc_open() streams for buffered fopen() streams
Submitted: 2017-11-28 01:11 UTC Modified: 2022-12-28 17:42 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: asmqb7 at gmail dot com Assigned: bukka (profile)
Status: Analyzed Package: Streams related
PHP Version: 7.1.12 OS: Arch Linux=Y Windows=? macOS=?
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: asmqb7 at gmail dot com
New email:
PHP Version: OS:

 

 [2017-11-28 01:11 UTC] asmqb7 at gmail dot com
Description:
------------
Sorry for the terse title - there were 0 characters left in the input field (but it's cool when it fits perfectly...)

I've discovered an interesting edge-case stream buffering glitch.

- I opened a file using fopen() and started a process using proc_open(). I wanted to read the file and write its contents into the process's standard input.

- To keep my I/O loop clean, I used stream_select() to marshal/multiplex reading and writing between the file stream and the process stdin/stdout/stderr streams.

- For some reason, my script did not begin writing to the process stdin until it had completely read the file contents. (?!)

- Investigation showed that stream_select() was not reporting the stdin stream as writable as long as the file stream was being passed to stream_select() (my code stopped passing the file stream when it detected EOF on the file).

It took me some time to wrap my head around the fact that that stream_select() was not acknowledging a writeable stream passed in the $write array as writeable, for as long as I was passing in an entirely unrelated stream (from a random disk file) into the $read array.

I initially discovered that setting the stream_select() timeout to 0 (as a hack) and then manually adding the file stream to the $read array _after_ stream_select() made the script consistently work perfectly (as my code looped through the readable arrays and did the correct things). But (as far as I understand) a timeout of 0 effectively removes the buffer-ratelimiting advantages of using stream_select() in the first place so I didn't want to use this workaround.

Intending to report a bug with the info I'd gathered, I asked in ##php on freenode about the 20-line limit for testcases, as it's hard to write stream-management PHP code in less than 20 lines without golfing, and I really didn't want to add external links that would invariably eventually die. (The reached consensus was that if a concise, minimal testcase was >20lines, then it shouldn't be the end of the world to attach the whole thing.)

While in ##php someone gave some consideration to my repro testcase and pointed me toward https://github.com/php/php-src/blob/dcdd4abdb2eb622c72c0b271e4f3a173ca8e6765/ext/standard/streamsfuncs.c#L822-L836. I'll quote the comment from that point in the source:

/* slight hack to support buffered data; if there is data sitting in the
 * read buffer of any of the streams in the read array, let's pretend
 * that we selected, but return only the readable sockets */

This gave me the idea to go digging to see if there were any buffer-control functions, discover there were, and rapidly find that

  stream_set_read_buffer($file, 0);

on the file stream completely rectified the issue I was having! Obviously this is the clean, universal solution to this problem I was looking for - my thanks to the person who helped me on IRC! (Around Mon Nov 24 2017 1:24AM UTC, for reference, and I was i336_.)

The attached testcase perfectly reproduces this issue (and the fix) on Arch Linux, but considering the abstract nature of the stream subsystem, testing on Windows and other UNIX (*BSD, IllumOS, macOS, etc) is probably a very good idea, and I'm interested to find out what the status of this issue is on various platforms. (I incidentally only have Linux machines at my disposal at the moment.)

After having gone round in circles with this for a couple days (mercifully not very long) I share the conclusion reached by the person who helped me on IRC, which is that this is a documentation problem - it's obvious to me that all the language components/features are "working as intended" individually, but when put together violate the principle of least surprise somewhat. I'll be linking to this bug in a few comment sections, but this is enough of a gotcha that adding it where everyone will invariably see it is probably a very good idea.

I know I can queue documentation changes but I'm not really sure how to go about adding info about this and would prefer to defer to other(s) with more experience with documenting and who have a better idea of the right spot to put this.

Beyond documentation, I added a "?" after "Docs" in the summary/subject because I'm not sure if this would be classified as a language bug to some extent as well (something something "slight hack"). I can see a clear argument that things are _technically_ working as intended, but still... :)

Test script:
---------------
<?php

// See "Actual Results" section for description of code and (extremely simple) usage directives

$file_bufsize = 64;
$stdin_bufsize = 1024;
$stdout_bufsize = 1024;

$delayms = 10000;

$inputfile = 'input.txt';

$file = fopen($inputfile, 'r');
$filesize = fstat($file)['size'];

for (;;) {
	
	$proc = proc_open("cat",
		[['pipe', 'r'], ['pipe', 'w'], ['pipe', 'w']], $pipes);
	
	/*1*/ #stream_set_read_buffer($file, 0);
	
	$readbytes = $proc_in = $proc_out = $proc_err = $zerowrites = 0;
	$proc_sendq = ""; 
	$stdin_done = false;
	
	print "starting\n";
	
	for (;;) {
		
		$read = $write = $except = [];
		
		foreach ([1, 2] as $i) if (!feof($pipes[$i])) $read[] = $pipes[$i];
		
		if (!$stdin_done && $proc_sendq != "") $write[] = $pipes[0];
		
		/*2*/ if ($filesize - $readbytes > 0) $read[] = $file;
		
		dump_fds($read, $write, $pipes, $file);
		print "=>  <sel";
		if ($read || $write) {
			/*3a*/ stream_select($read, $write, $except, null);
			#/*3b*/ stream_select($read, $write, $except, 0);
		}
		print "ect>  =>  ";
		dump_fds($read, $write, $pipes, $file);
		print "\n";
		
		foreach ($read as $rfd) {
			switch ($rfd) {
				case $file:
					// file input
					$data = fread($file, min($file_bufsize, $filesize - $readbytes));
					$readbytes += strlen($data);
					$proc_sendq .= $data;
					break;
				case $pipes[1]:
				case $pipes[2]:
					// process stdout or stderr (which is input from PHP's POV)
					// cat never writes stderr so use stdout's bufsize
					$l = strlen($data = fread($rfd, $stdout_bufsize));
					if ($rfd == $pipes[1]) {
						$proc_out += $l; $s = "out";
					} else {
						$proc_err += $l; $s = "err";
					}
					print "\nstd".$s.": \"".$data."\"\n\n";
			}
		}
		
		// only $pipes[0] will ever be in the $write array
		if (in_array($pipes[0], $write) && $proc_sendq != "") {
			$len = fwrite($pipes[0], $proc_sendq, $stdin_bufsize);
			if ($len == 0) {
				if ($zerowrites++ == 5) {
					// note zerowrites is zeroed just below; this counts
					// a series of successive write failures as fatal.
					print "\n\nI/O error\n\n";
					die;
				}
			} else {
				$proc_in += $len;
				$zerowrites = 0;
				$proc_sendq = substr($proc_sendq, $len);
				if ($proc_in == $filesize) {
					fclose($pipes[0]);
					$stdin_done = true;
					print "\nstdin done\n\n";
				}
			}
		}
		
		print "file($readbytes)  buf:".strlen($proc_sendq)
			."  proc(I:$proc_in O:$proc_out E:$proc_err)\n";
		
		// if process has quit AND its stdout/stderr are closed, it's done.
		if (!proc_get_status($proc)['running'] && feof($pipes[1]) &&
			feof($pipes[2])
		) { print "\n\nbreaking\n"; break; }
		
		usleep($delayms);
		print "\n";
	}
	print "\nloop finished => proc_close()\n";
	fclose($pipes[1]); fclose($pipes[2]); proc_close($proc);
}
/* Aesthetic printing function, buried at the bottom to improve
 * general readability. Implementation unimportant */
function dump_fds($read, $write, $pipes, $file) {
	foreach (['R' => $read, 'W' => $write] as $label => $fds) {
		print $label.": [ ";
		foreach ($fds as $f) {
			print ($f == $file ? 'file' : ['in', 'out', 'err']
				[array_search($f, $pipes)]).'#'.(int)$f.' ';
		}
		print "]  ";
	}
}

Expected result:
----------------
I'm not sure what to expect - this issue may ultimately be judged to be working as intended, or it may be acknowledged as a bug.

(Not going to duplicate problem description for this field)

Actual result:
--------------
The attached testcase launches "cat", opens a file of data, pushes that data into cat's stdin, reads cat's stdout, and prints the data from stdout to the screen. It also displays debugging commentary. Its output (there is a lot) is ratelimited (by usleep()ing for $delayms set at the top of the script) so it won't overload the terminal/console.

You need a file to open. Any existing bit of data will do, as long as it can be sanely printed to stdout. ~40KB of input combines well with the 10µs delay I'm using.

In case it's useful: (written for Linux, unsure if these work the same on *BSD et al)

$ cat /dev/urandom | od -An -vtx1 | head -c 40960 > input.txt

My apologies the testcase is long. Only a couple of lines need to be carefully scrutinized/reasoned about. The total length is primarily due to the scaffolding required to read from a file and push it into a pipe. I have absolutely no idea how to trigger the described failure mode with a shorter sequence of code - I expect there is a shorter reproduction case, but I have no idea how I'd go about finding it.


I'll quote a bit of script output so I can point out a few things.

---------------------------------------------------------------------------

R: [ out#7 err#8 file#5 ]  W: [ in#6 ]  =>  <select>  =>  R: [ file#5 ]  W: [ ]  
file(65344)  buf:64896  proc(I:448 O:384 E:0)

R: [ out#7 err#8 file#5 ]  W: [ in#6 ]  =>  <select>  =>  R: [ file#5 ]  W: [ ]  
file(65408)  buf:64960  proc(I:448 O:384 E:0)

R: [ out#7 err#8 file#5 ]  W: [ in#6 ]  =>  <select>  =>  R: [ file#5 ]  W: [ ]  
file(65472)  buf:65024  proc(I:448 O:384 E:0)

R: [ out#7 err#8 file#5 ]  W: [ in#6 ]  =>  <select>  =>  R: [ file#5 ]  W: [ ]  
file(65535)  buf:65087  proc(I:448 O:384 E:0)

R: [ out#7 err#8 ]  W: [ in#6 ]  =>  <select>  =>  R: [ out#7 ]  W: [ in#6 ]  

stdout: "8 8a 7b
 b8 ef 04 03 87 f5 82 b9 b2 7e 82 db 6e 15 b4 2f
 ab e2 "

file(65535)  buf:65023  proc(I:512 O:448 E:0)

R: [ out#7 err#8 ]  W: [ in#6 ]  =>  <select>  =>  R: [ out#7 ]  W: [ in#6 ]  

stdout: "3b b3 16 e5 1f ef f2 a4 95 6e 53 54 32 22
 a1 c4 d6 96 a0 07 a2 "

file(65535)  buf:64959  proc(I:576 O:512 E:0)

R: [ out#7 err#8 ]  W: [ in#6 ]  =>  <select>  =>  R: [ out#7 ]  W: [ in#6 ]  

stdout: "90 7d 4c 80 98 03 c3 63 78
 e1 7e 23 1d 4c 00 43 a3 22 52 46 d6 "

file(65535)  buf:64895  proc(I:640 O:576 E:0)

R: [ out#7 err#8 ]  W: [ in#6 ]  =>  <select>  =>  R: [ out#7 ]  W: [ in#6 ]  

stdout: "6a 49 b2 09
 74 e7 44 b3 f2 11 aa 0e db a0 74 28 05 fd bc 32
 56"

---------------------------------------------------------------------------

- The hexadecimal output is meaningless and is the contents of my test file.

- The <select> lines show the streams being passed to stream_select(), and then what is being passed back by stream_select() when it's done. The stream IDs are annotated with "file", "in", "out" and "err". I think it's worth noting/remembering that stdin goes in the writeable array.

- The lines like "file(65535)  buf:64895  proc(I:640 O:576 E:0)" are saying that 64K has been read from the file; 64895 bytes are in $proc_sendq (which is .= appended to by file reads, and substr()ed from [the beginning of] when stdin is writeable); and that the "proc"ess has had 640 bytes written to its stdin, 576 bytes read from its stdout and 0 bytes read from its stderr.

- The first four iterations shown here describe that the file is repeatedly being read in without much attention being paid to writing stdin. Once the file hits exactly 64K, the script hits file EOF, removes the file stream from the $read array, and then begins to write consistently to cat's stdin on every loop (and subsequently read from cat's stdout and display data). What's clearly visible here is the cutoff point where "file#5" stops being passed to stream_select(), and "in#6" immediately shows up consistently after the stream_select() as a direct result.


I think the following may be a significant area to investigate: you may have noticed that the four iterations at the top indicate (with "proc(I:448 O:384 E:0)") that cat _does_ actually have a tiny bit of data written to its stdin and read from its stdout. This is true. What's actually happening is that...

---------------------------------------------------------------------------

R: [ out#7 err#8 file#5 ]  W: [ in#6 ]  =>  <select>  =>  R: [ file#5 ]  W: [ ]  
file(8192)  buf:8192  proc(I:0 O:0 E:0)

R: [ out#7 err#8 file#5 ]  W: [ in#6 ]  =>  <select>  =>  R: [ file#5 ]  W: [ in#6 ]  
file(8256)  buf:7232  proc(I:1024 O:0 E:0)

R: [ out#7 err#8 file#5 ]  W: [ in#6 ]  =>  <select>  =>  R: [ file#5 ]  W: [ ]  
file(8320)  buf:7296  proc(I:1024 O:0 E:0)

---------------------------------------------------------------------------

...every few thousand iterations, PHP randomly decides the stdin stream is writeable. (You have to not blink in order to notice the "in#6" fly past and hit ^C!)

IMO, 8192 bytes read from the file is a really odd point for the first stdout write to be happening on/near/just after. I'm not sure how to interpret this but I'm sure it's somehow significant and worth mentioning.

Adjusting the various $*_bufsize values will control the frequency at which these rare-ish stdin writes occur. In particular adjusting the $file_bufsize will make the most difference; small multiples such as 128, 256, 512 or 1024 are recommended (at 1024 it reads the whole file pretty quickly).

Regarding the defaults I've set: in any other circumstance, having a $file_bufsize of 64 and a $stdin_bufsize of 1024 would be useless: if the source is reading 64 bytes at a time then sinking with a larger buffer size is going to have no effect. It does have an effect in this situation because PHP only decides stdin is writeable "every once in a while" - so when it does eventually write, it's built up enough data to easily write 1024 bytes.


Final notes:

- To reproduce my initial hack/workaround, move the line marked with /*1*/ immediately below the if(){} block containing the stream_select() calls, and switch the stream_select() from a null timeout to a 0 timeout.

- The rectifying stream_set_read_buffer() call is near the top, commented out.


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2022-12-28 17:42 UTC] bukka@php.net
-Status: Open +Status: Analyzed -Assigned To: +Assigned To: bukka
 [2022-12-28 17:42 UTC] bukka@php.net
Apology for taking so long to look into this. After investigation and some debugging I think the current behaviour is not optimal and we should optimize it. I would not consider it as a bug but more as an improvement. This is due to early returned as correctly linked in the code. I have created a new GitHub issue describing the way how this should be optimized: https://github.com/php/php-src/issues/10177
 [2023-04-05 08:25 UTC] educationblog86 at gmail dot com
Online Education Books are sharing about school, college, university, exam, govt job notification, courses, etc. More info to visit: (https://onlineeducationbooks.com)github.com
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 14 10:01:28 2024 UTC