php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #79501 TLS connections freezing on 7.4 (all versions after 7.3.17)
Submitted: 2020-04-20 17:30 UTC Modified: 2024-03-29 16:27 UTC
Votes:23
Avg. Score:4.3 ± 0.8
Reproduced:22 of 22 (100.0%)
Same Version:9 (40.9%)
Same OS:17 (77.3%)
From: imnieves at gmail dot com Assigned: bukka (profile)
Status: Closed Package: OpenSSL related
PHP Version: 7.4.5 OS: Linux
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: imnieves at gmail dot com
New email:
PHP Version: OS:

 

 [2020-04-20 17:30 UTC] imnieves at gmail dot com
Description:
------------
Several users of phpredis (a PHP interface to Redis) have noticed (in php from 7.4.0 and onwards) that TLS connections to Redis "freeze".  This seems to mean that the connection stops or otherwise times out.  When we establish non-TLS connections to Redis (using the same phpredis), we experience no "freeze".  This seems to indicate that something in OpenSSL, or perhaps PHP itself, changed which led to the freezing behavior.

I have personally verified the issue exists on all versions of 7.4 from 7.4.0 to 7.4.5, and that the issue does not exists on version 7.3.17


Some more details and description exists at:
https://github.com/phpredis/phpredis/issues/1726

Test script:
---------------
Some testing setup exists at:
https://github.com/phpredis/phpredis/issues/1726

Expected result:
----------------
The expectation is that TLS connections do not "freeze".


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2020-04-20 18:47 UTC] stas@php.net
-Status: Open +Status: Suspended -Type: Security +Type: Bug
 [2020-04-20 18:47 UTC] stas@php.net
I don't think "something in OpenSSL, or perhaps PHP itself, changed" is an actionable data that we could do anything with. If you have some specific reproducible issue traceable to PHP core or PHP extension covered by bugs.php.net, please reopen with specific information.
 [2020-04-20 22:28 UTC] imnieves at gmail dot com
I hope the information below is helpful, or at least enough to re-open.

==============================

From the previously mentioned website:

<?php

function l($message)
{
    global $current;
    echo $message . ' (' . (microtime(true) - $current) . ')' . PHP_EOL;
    $current = microtime(true);
}

ini_set('default_socket_timeout', 5);

$start = microtime(true);
$current = microtime(true);

l('start');
$redis = new Redis();
if (!$redis->connect('tls://redis-instance', 25061, 5, null, 5, 5)) {
    throw new RuntimeException($redis->getLastError());
}
l('connected');
$redis->setOption(Redis::OPT_READ_TIMEOUT, 5);
l('option is set');
$redis->auth('some-password');
l('auth complete');
$redis->select(4);
l('db selected');

echo 'Total: ' . (microtime(true) - $start) . PHP_EOL;

=========================

"occasionally this script hangs up for 300 seconds:"

=========================

user:~$ while true; do php redis.php; done
start (5.9604644775391E-6)
connected (0.017167091369629)
option is set (5.0067901611328E-6)
auth complete (0.00058794021606445)
db selected (0.00032281875610352)
Total: 0.018187999725342
start (4.0531158447266E-6)
connected (0.012931108474731)
option is set (7.1525573730469E-6)
auth complete (0.00078892707824707)
db selected (0.00040698051452637)
Total: 0.014256000518799
start (4.0531158447266E-6)
connected (0.01163911819458)
option is set (1.0967254638672E-5)
auth complete (300.07370495796)
db selected (0.00037598609924316)
Total: 300.08729100227

==========================

strace log:

==========================

write(3, "\27\3\3\0006[O\21\240\275|H\243\356\344\16\277\215\243\362h;\350*<N@\24\25\225\254["..., 59) = 59
fcntl(3, F_GETFL)                       = 0x802 (flags O_RDWR|O_NONBLOCK)
fcntl(3, F_SETFL, O_RDWR)               = 0
poll([{fd=3, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, 0) = 0 (Timeout)
fcntl(3, F_GETFL)                       = 0x2 (flags O_RDWR)
fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
read(3, "\27\3\3\0\26", 5)              = 5
read(3, "\262\341\277k3\317O\3143X\232\313\310\2742\246\v\203\341\332T\261", 22) = 22
fcntl(3, F_GETFL)                       = 0x802 (flags O_RDWR|O_NONBLOCK)
fcntl(3, F_SETFL, O_RDWR)               = 0
write(1, "2020-03-17 10:49:05 auth complet"..., 522020-03-17 10:49:05 auth complete (300.79558706284)
) = 52
poll([{fd=3, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, 0) = 0 (Timeout)
fcntl(3, F_GETFL)                       = 0x2 (flags O_RDWR)
fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
write(3, "\27\3\3\0(\245\360\223\217\252\375p\311\324\246\323\1\303\243\360\343\302\2:\214\5'\"\224\32\2500"..., 45) = 45
fcntl(3, F_GETFL)                       = 0x802 (flags O_RDWR|O_NONBLOCK)
fcntl(3, F_SETFL, O_RDWR)               = 0
poll([{fd=3, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, 0) = 0 (Timeout)
fcntl(3, F_GETFL)                       = 0x2 (flags O_RDWR)
fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
read(3, "\27\3\3\0\26", 5)              = 5
read(3, "\327n\353\321\367\354\317[\274&\316)\237C\304\305\275\343\4\266\334\10", 22) = 22
fcntl(3, F_GETFL)                       = 0x802 (flags O_RDWR|O_NONBLOCK)
fcntl(3, F_SETFL, O_RDWR)               = 0
write(1, "2020-03-17 10:49:05 db selected "..., 542020-03-17 10:49:05 db selected (0.00063395500183105)
 [2020-05-04 14:52 UTC] nikic@php.net
-Status: Suspended +Status: Re-Opened
 [2020-05-04 14:52 UTC] nikic@php.net
One relevant difference that comes to mind is that PHP 7.4 will negotiate TLS 1.3 by default. Do you know which TLS version actually gets used?
 [2020-05-06 19:00 UTC] imnieves at gmail dot com
I will investigate
 [2020-05-06 20:51 UTC] imnieves at gmail dot com
I have verified that TLS 1.2 connections work consistently, while TLS 1.3 connections freeze (fail to establish) on PHP 7.4.5-fpm.

On the other side of the testing, both TLS 1.2 and TLS 1.3 connections work consistently on PHP 7.3.17-fpm.

To perform this test I put nginx in front of my Redis, disabled TLS on Redis entirely, and enabled TLS on nginx. I varied the TLS settings on nginx as follows: TLS 1.2 only, then TLS 1.3 only, then both TLS 1.2 and TLS 1.3.

Interesting to note, when nginx has:

only TLS 1.2, the connections always work
only TLS 1.3, the connections always freeze
both TLS 1.2 and TLS 1.3, the connections work for the first 5 attempts, and then one freezes. If I attempt another connection then the next 4 or 5 connections succeed and then another one freezes. And this cycle repeats.
 [2020-05-06 20:54 UTC] imnieves at gmail dot com
In a separate method of testing, I constrained TLS versions of phpredis (instead of constraining TLS versions of nginx)

In this way I have verified on php 7.4.5-fpm that TLS 1.2 results in consistent working connections. TLS 1.3 results in some connections that work and most other connections that fail.

I can also say the cipher suite being used in all my TLS 1.3 connections (consistent and freezing): TLS_AES_256_GCM_SHA384
 [2020-05-08 11:41 UTC] cmb@php.net
-Package: *Encryption and hash functions +Package: OpenSSL related
 [2020-05-08 11:41 UTC] cmb@php.net
According to a comment on the PhpRedis issue[1] this might be an
issue with our tlsv1.3 stream wrapper implementation:

| […] because right now it looks like something may be wrong in
| PhpRedis even if it just wraps php streams.

[1] <https://github.com/phpredis/phpredis/issues/1726#issuecomment-625319182>
 [2020-08-08 17:58 UTC] imnieves at gmail dot com
According to comment from a maintainer of PhpRedis[1], they feel that the issue is not coming from PhpRedis itself:

| I'm going to close this issue because it is actually not a bug in phpredis.

[1] https://github.com/phpredis/phpredis/issues/1726#issuecomment-652563196
 [2022-02-02 22:28 UTC] imnieves at gmail dot com
I will break this message into two parts, hopefully that will help this message to not be detected as SPAM.

I was able to determine the exact line that is blocking in PHP series 7.4 (7.4..27 precisely) when TLS 1.3 is being used.

The offending line is:
https://github.com/php/php-src/blob/96f753a2b56b7c4927f1a64253ca60bb481ee2c3/ext/openssl/xp_ssl.c#L2466

int n = SSL_peek(sslsock->ssl_handle, &buf, sizeof(buf));

For reference, the PHP method that calls this SSL_peek is itself called at:
https://github.com/php/php-src/blob/96f753a2b56b7c4927f1a64253ca60bb481ee2c3/main/streams/streams.c#L790

if (!stream->eof && PHP_STREAM_OPTION_RETURN_ERR ==
	   	php_stream_set_option(stream, PHP_STREAM_OPTION_CHECK_LIVENESS,
	   	0, NULL)) {
	stream->eof = 1;
}

After reading about so-called SSL non-application data records:
https://github.com/openssl/openssl/blob/6e94b5aecd619afd25e3dc25902952b1b3194edf/CHANGES#L237
https://wiki.openssl.org/index.php/TLS1.3#Non-application_data_records
openssl/openssl#7327

I believe the blocking is occurring because PHP 7.4 is assuming (in xp_ssl.c @ L2466, the first code snippet above) that if there is a record that can be read, then the record must be a data record and therefore the SSL_peek will not block. This assumption is false in both TLS 1.2 and TLS 1.3, however the impact of this assumption is felt rarely (if at all) in TLS 1.2 but it arises frequently in TLS 1.3, in fact it arises instantly in TLS 1.3.

In TLS 1.2, this line:
https://github.com/php/php-src/blob/96f753a2b56b7c4927f1a64253ca60bb481ee2c3/ext/openssl/xp_ssl.c#L2464

} else if (php_pollfd_for(sslsock->s.socket, PHP_POLLREADABLE|POLLPRI, &tv) > 0) {

evaluates to false.
But in TLS 1.3 it evaluates to true, which enables SSL_peek to be called, and block.
 [2022-02-02 22:29 UTC] imnieves at gmail dot com
Here is the second part of the message.


I was able to get TLS 1.3 working in PHP 7.4 by wrapping the SSL_peek function call and the logic immediately following that call with a call to SSL_pending. SSL_pending returns the count of data bytes that can be read, and these are application data bytes not the non-application data records, so if SSL_pending > 0 then SSL_peek will not block. Beyond that, SSL_pending itself will not block, so it is safe to call.

To show what I did, I converted these lines from:
https://github.com/php/php-src/blob/96f753a2b56b7c4927f1a64253ca60bb481ee2c3/ext/openssl/xp_ssl.c#L2466

int n = SSL_peek(sslsock->ssl_handle, &buf, sizeof(buf));
if (n <= 0) {
	int err = SSL_get_error(sslsock->ssl_handle, n);
	switch (err) {
		case SSL_ERROR_SYSCALL:
			alive = php_socket_errno() == EAGAIN;
			break;
		case SSL_ERROR_WANT_READ:
		case SSL_ERROR_WANT_WRITE:
			alive = 1;
			break;
		default:
			/* any other problem is a fatal error */
			alive = 0;
	}
}

to these lines:

if(SSL_pending(sslsock->ssl_handle)){
	int n = SSL_peek(sslsock->ssl_handle, &buf, sizeof(buf));
	if (n <= 0) {
		int err = SSL_get_error(sslsock->ssl_handle, n);
		switch (err) {
			case SSL_ERROR_SYSCALL:
				alive = php_socket_errno() == EAGAIN;
				break;
			case SSL_ERROR_WANT_READ:
			case SSL_ERROR_WANT_WRITE:
				alive = 1;
				break;
			default:
				/* any other problem is a fatal error */
				alive = 0;
		}
	}
}

I am not claiming this fixes the problem entirely or that there are no side-effects, as I did not run any automated tests or any other manual tests. But this seemingly improved behavior in our particular use case seems to increase the likelihood of the location of the flawed code.

In summary, I am thinking the issue is related how PHP streams wraps OpenSSL. The issue seems to be the incorrect assumption that if records are available to read then SSL_peek will not block. Finally, although this issue seems to arise mostly in TLS 1.3, this is quite likely also an issue in PHP streams on TLS 1.2, although very (possibly very very) rare.

For reference on that last sentence, see the first comment by mattcaswell at:
https://github.com/openssl/openssl/issues/7327


Further reference:
https://www.openssl.org/docs/man1.1.1/man3/SSL_pending.html
https://www.openssl.org/docs/man1.1.1/man3/SSL_read.html
 [2022-02-06 07:57 UTC] imnieves at gmail dot com
This bug has also noticed the same underlying problem:
https://github.com/reactphp/socket/issues/184

The problem is that when a server has sent no data, a client side call to feof or stream_eof will block when TLS 1.3 is used and will not block when TLS 1.2 is used.

I am quite confident my fix/workaround above is not correct.
 [2022-12-29 15:53 UTC] bukka@php.net
-Assigned To: +Assigned To: bukka
 [2022-12-29 15:53 UTC] bukka@php.net
I have done some investigation and updated the linked issue in phpredis as this might potentially be a problem with API usage there - it calls php_stream_eof before php_stream_write.
 [2022-12-30 11:57 UTC] bukka@php.net
As I just mentioned in the Redis issue thread 
( https://github.com/phpredis/phpredis/issues/1726#issuecomment-1367875600 ) and as was correctly mentioned here before by the OP, there is an issue with the liveness checking as the poll is only readable. The problem is the freezing and not respecting the socket timeout. Currently the only solution that I can think of is to do a switch to non blocking for eof check which we do in other places. We can't really know when SSL_peek blocks otherwise - neither SSL_pending nor poll on read is reliable here. I will need to think about it and also think how we could cover it by test as it will be challenging too.
 [2024-03-29 16:27 UTC] bukka@php.net
-Status: Re-Opened +Status: Closed
 [2024-03-29 16:27 UTC] bukka@php.net
This should be fixed by https://github.com/php/php-src/commit/c1bd9a932a55d2efba64126118477de7055b1c2e . It will be part of 8.2.19 and 8.3.6.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 08:01:29 2024 UTC