php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #79222 unread_bytes do not work
Submitted: 2020-02-04 13:13 UTC Modified: 2020-02-05 11:13 UTC
From: skullnobrains at gmail dot com Assigned:
Status: Open Package: Sockets related
PHP Version: 7.2.27 OS: linux
Private report: No CVE-ID: None
 [2020-02-04 13:13 UTC] skullnobrains at gmail dot com
Description:
------------
i am directly connecting to a mysql server using stream socket functions.

when there is something to read, stream_get_meta_data() reports 0 bytes to read, and 255 when there is nothing to read.

i tried multiple 7.x versions and ended up with similar results : 7.2.7 , 7.3 , and a few others



Test script:
---------------
function my_msg_r($sock){
	echo "===== READ =====\n";
	var_dump(stream_get_meta_data($sock)['unread_bytes']);
	$header=stream_socket_recvfrom($sock, 4, STREAM_PEEK);
	my_od($header);
	if(4!==strlen($header))
		return my_wrn("failed to read a complete packet header");
	$len=my_str2int(substr($header,0,3));
	if($len===0xffffff)return my_wrn("i do not handle packets longer than 2pow24 yet, sorry");
	
	printf("READ	LEN	%s	0x%x\n",$len,$len);
	printf("READ	SEQ	%u\n",ord($header[3]));
	
	/* TODO CHECK THE SEQUENCE NUMBER : THE CALLER SHOULD PASS THE EXPECTED SEQ NUMBER AND WRITE AN OUT OF ORDER ERROR IF SOMETHING GOES WRONG */
	
	/* TODO CHECK WE HAVE ENOUGH UNREAD BYTES OR AT LEAST WHAT LENGTH WE MANAGED TO READ */	
	
	//if($n===0xffffff)return fread($sock,$n).my_msg_r($sock);	# TODO proper error handling + this is not compatible with async operations
	return substr(fread($sock,$len+4),4); # TODO proper error handling !
}

/* i am not including my_od which merely produces the lines starting with od : one line per character with the increment, decimal ascii code, ascii character, and hex value and have no relation with the bug. this looks like an old 5.6 bug i stumbled upon many years ago and was fixed */

Actual result:
--------------
===== READ =====
int(0)
od 0	7		111
od 1	0		0
od 2	0		0
od 3	2		10
READ	LEN	7	0x7
READ	SEQ	2
CONNECTED AND AUTHENTICATED : Resource id #2
===== WRITE =====
od 0	15		1111
od 1	0		0
od 2	0		0
od 3	0		0
WRITE	LEN=15
WRITE	SEQ=0
===== READ =====
int(0)
od 0	1		1
od 1	0		0
od 2	0		0
od 3	1		1
READ	LEN	1	0x1
READ	SEQ	1
field count=1
===== READ =====
int(255)

... a this point, i hit ctrl+c because there is and won't be anything to read

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2020-02-04 13:39 UTC] nikic@php.net
unread_bytes is the number of bytes stored in PHP's internal read buffer. It does not indicate whether data is available to read or not. That's what stream_select() is for.

The documentation has this peculiar note for this value:

> Note: You shouldn't use this value in a script.
 [2020-02-04 14:18 UTC] skullnobrains at gmail dot com
ok, duly noted. it did work reliably at some point though ( and strangely still does in some cases )

this explains why the number of unread bytes would be zero when data just freshly arrived into the network buffer.

but that does not really explain why the internal buffer would contain 255 characters when it is actually empty ... ?

not sure it is a bug. it is quite painful not to be able to determine the number of bytes that can be read before reading them any more other than with PEEK. 

thanks for your time

best regards
 [2020-02-04 14:33 UTC] nikic@php.net
stream_socket_recvfrom() is a raw socket operation, it will perform a direct read from the stream and bypass PHP's internal read buffer. I would expect that if unread_bytes=255 and you did an fread() rather than stream_socket_recvfrom(), then that would return something. Could you test whether it does?

> not sure it is a bug. it is quite painful not to be able to determine the number of bytes that can be read before reading them any more other than with PEEK.

That's not how you would typically do it (first time I hear about PEEK...) Normally you set the stream into non-blocking mode and then use stream_select() to determine whether it can be read. In non-blocking mode, fread() will return as much as can be read (which might be an empty string), but will not block the connection.
 [2020-02-04 14:59 UTC] skullnobrains at gmail dot com
i was actually performing a bunch of tests along those lines

-> fgetc and other such functions do work as expected
in that case the first char was ascii code 75 ( letter K ) so nothing very specific
-> reading a single byte with stream_socket_recv_from fails w/o PEEK

if i set the socket to non-blocking mode, both funcs return a single character without issue but not the same one !?

regarding PEEK, there are a number of possible reasons to use it such as leaving incomplete data in system buffers rather than allocating and managing new ones. in my specific case, the beginning of each packet contains it's size and i want to read a single packet at a time without accidentally reading the beginning of a different packet.

i have used recv_from in the past and this is the first time i see it fail while there are supposedly unread bytes or other such weird behaviors. again maybe i have been very lucky for a long time.

i'm digging into stream_set_read_buffer since i suspect something changed by default
 [2020-02-05 11:13 UTC] skullnobrains at gmail dot com
hello again

i confirm that if i disable the read buffer, the behavior becomes somehow consistent : PEEKING and then READING returns the same character(s).

but the number of unread bytes is now always zero

--

from what i gather, it is fairly likely PEEKING always worked previously either because i used to PEEK as soon as the select call returned while in this case, i read multiple packets so php has a chance to move data from one buffer to the next before i PEEK or because there was no buffering.

example of a piece of working code that allows both ssl and plain connections. those are integrated in a select loop.

if(!stream_get_meta_data($_r)["crypto"] and  22===ord(stream_socket_recvfrom($_r, 1, STREAM_PEEK))){
				dbg('wants ssl');
				$stats['ssl_wanted']++;
				$states[$_r]=10000;
				continue;
			}
...
if($states[$_r]===10000){
			$stats['ssl_ec_calls']++;
			if(true===$crypto=stream_socket_enable_crypto($_r,true,STREAM_CRYPTO_METHOD_ANY_SERVER)){
				$states[$_r]=0;
				$stats['ssl_ok']++;
				dbg('SSL HANDSHAKE COMPLETED');
			}
			elseif($crypto===false)
				answ("480 crypto requested but handshake cannot complete");
			elseif($crypto!==0)
				answ("580 crypto requested but unexpected answer from enable_crypto()");
			sdbg('ssl handshake status %s',$crypto);
			continue;
		}


since unread_bytes used to be more or less reliable and definitely not always zero ( i have not written code that actually relies on that much in a while so that may have been back in the php5 days ) i would assume first case.

may i suggest that using recv_from on a buffered socket raise at least a warning since the behavior is quite erratic and based on a race condition between the script and whatever php routine flushes the network buffers or be modified so it does take the php buffer into account which would be consistent rather than reading random later transmitted bytes or blocking ?

also note the documentation of the PEEK flag of stream_socket_recv_from() is wrong as it clearly states the same bytes will be read again by a subsequent call to fread().

as far as i am concerned, i will stick to disabling php's internal buffering, forget about unread_bytes, and probably play a lot with blocking->non-blocking->blocking transitions so i can peek the number of bytes without blocking my scripts.

if the buffering stays that way, i see little reason why unread bytes could not be made reliable.

since select does not know about php's buffering, i believe the best course of action would be to integrate the buffer into recv_from() transparently and have unread bytes return the sum of the bytes in php's buffer and tcp's. we have low level socket functions for other cases.

best regards anyway and thanks again for your time
 [2023-06-13 06:00 UTC] RobertBetts22 dot 94 at gmail dot com
not sure it is a bug. it is quite painful not to be able to determine the number of bytes that can be read before reading them any more other than with PEEK. (https://www.mythdhr.biz/)github.com
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Dec 13 02:01:27 2024 UTC