php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #81516 Curl used with CURLOPT_WRITEFUNCTION rarely corrupts the response
Submitted: 2021-10-08 17:52 UTC Modified: 2021-10-14 20:02 UTC
From: roland at nextendweb dot com Assigned:
Status: Not a bug Package: cURL related
PHP Version: 7.4.24 OS:
Private report: No CVE-ID: None
 [2021-10-08 17:52 UTC] roland at nextendweb dot com
Description:
------------
This issue happened through WordPress update system during plugin update. Our update url points to our private server and serving a zip file. We had tens of thousands updates for this very same file and there were only 2 sites which had this issue, so I think it is not related how we serve the file.

The error is md5_mismatch: The checksum of the file (4a3b968a44c585a2883e687d61c251fb) does not match the expected checksum value (231732259d67fe83ed6fc02d7ad9be57).

I started to debug this issue, I had the original zip file and the corrupted one which WordPress downloaded. I diffed the files and at some point it seems like some part is missing in the corrupted file. Like the stream lost a piece from the buffer. And it happens all the time on those servers.

https://i.imgur.com/Nx0SKeK.png

Curl transport handled this download, so I head over to /wp-includes/Requests/Transport/cURL.php: https://github.com/WordPress/WordPress/blob/master/wp-includes/Requests/Transport/cURL.php

And when I removed CURLOPT_BUFFERSIZE or changed anything else (Requests::BUFFER_SIZE+1, Requests::BUFFER_SIZE-1), download_url function started to behave as it should and the file was not corrupted anymore.
<?php
curl_setopt($this->handle, CURLOPT_BUFFERSIZE, Requests::BUFFER_SIZE);


Requests::BUFFER_SIZE value is 1160 in WordPress.

Exactly one 1160 bytes block is missing in the middle of the file, which is the value of Requests::BUFFER_SIZE.

https://i.imgur.com/KoWkzco.png


Both site had the following configuration:
WordPress 5.8.1
PHP 7.4.24
Curl 7.71.0


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2021-10-08 18:38 UTC] requinix@php.net
-Status: Open +Status: Feedback -Package: Streams related +Package: cURL related
 [2021-10-08 18:38 UTC] requinix@php.net
Are you sure this is related to CURLOPT_WRITEFUNCTION? Can you confirm that stream_body is not being called for some chunk in the middle of the response? Would be nice if you could modify cURL.php to not need the write function, but with a casual glance at the code I don't think it would be easy.
How about varying the buffer size to some fraction/multiple of 1160, like 580 or 2320?
And, of course, are there any relevant differences in configuration between the servers? How about with the files or paths being written to?

No offense but this doesn't really sound like a bug in PHP, or even in cURL itself, but rather some weird hiccup or fault with those specific servers.
 [2021-10-12 12:57 UTC] roland at nextendweb dot com
I was able to create a small test case which can reproduce the issue without WordPress. 

phpinfo(); -> https://gist.github.com/nextend/3905674a48c2da5c95dfea90b2115f82
PHP 7.4.24
cURL 7.71.0

Test code:
https://gist.github.com/nextend/2819c93635c5f1c68c13cf1e28a2ef19

The problem happens at CURLOPT_BUFFERSIZE => 1160, 1160*2, 1160*3
1160*4 seems to work fine.


See the md5 hash of the stream write data at 1160 buffer size. (Left is the normal server, right is the bad server) https://i.imgur.com/Jk0hzaM.png

Between 3683 and 3711 the chunk's md5 are mismatching.

If you want to I can send you the private url for the test in email.
 [2021-10-12 13:51 UTC] roland at nextendweb dot com
Also worth to mention that this issue does not happen when the server which serves the zip has HTTP/2 enabled. It happened only with HTTP/1.1
 [2021-10-12 18:49 UTC] requinix@php.net
-Status: Feedback +Status: Open
 [2021-10-12 18:49 UTC] requinix@php.net
Can you modify your script to also include the data length in the log output?

After doing that and running the script (and if seeing the length doesn't identify the problem), can you additionally modify the fwrite() statement to be a loop:

  for ($len = 0; $len < $data_length; ) {
    $len += fwrite($this->stream_handle, substr($data, $len));
  }

This is because fwrite() does not guarantee to write its entire string at once, and will return the number of bytes it actually wrote. This is almost never an issue with writing to a local file, though...

Back to the data length. The output shows those 40 chunks *present* but with different hashes. Your original report said that 1160 bytes were *missing*. Does that mean you're seeing two different behaviors between the WP code and your script?
Because if cURL is giving the write function 1160 bytes, that means your original report had a problem with the callback 1 time while your new script had a problem with the callback 40 times.

Speaking of, note that the buffer size is "a request, not an order", so the chunks received by the callback may not be exactly that size.
https://curl.se/libcurl/c/CURLOPT_BUFFERSIZE.html
 [2021-10-12 18:49 UTC] requinix@php.net
-Status: Open +Status: Feedback
 [2021-10-13 08:15 UTC] roland at nextendweb dot com
Thank you for your tips!

As you wrote CURLOPT_BUFFERSIZE is just a request and not an order, so it might happen that a part of the response is missing and this is why the md5 hashes are different for several chunks. So I can see why my check of the chunk's md5 hashes might not the right tool for debugging this.

Expected response at buffersize 4:
aaaa
bbbb
cccc
dddd
-> response hash OK

Valid response at buffersize 4:
aaaa
bbb    -> chunk hash mismatch
bcccc  -> chunk hash mismatch
dddd
-> response hash OK

Not valid response at buffersize 4:
aaaa
bb     -> chunk hash mismatch
ccc    -> chunk hash mismatch
dddd
-> response hash FAIL

I think the latter happens, but I will do more debugging with the length of the chunks.

Currently I do not have access to the server which produced this error, but will try to get one. We know about 10 different servers which produced this and they are were the very same response MD5  hash, so it could not be a coincident.
 [2021-10-13 16:35 UTC] roland at nextendweb dot com
The following gist works like a charm and the md5 of the data is fine without the PHP stream write:

https://gist.github.com/nextend/61bb5a8b36ea126f3e95297f3f7de10e


I tried, but it takes forever to finish:
  for ($len = 0; $len < $data_length; ) {
    $len += fwrite($this->stream_handle, substr($data, $len));
  }
 [2021-10-13 16:49 UTC] roland at nextendweb dot com
https://gist.github.com/nextend/8c0d752024bac9c0a030d5ba022dfac9

There is a chunk which is not written into the stream. The output is the following:
/ 1160
Chunk #4430: 781473d903bedf95d8a07d34974ac548 1160
Good md5: 231732259d67fe83ed6fc02d7ad9be57
Stream md5: 4a3b968a44c585a2883e687d61c251fb
Memory md5: 231732259d67fe83ed6fc02d7ad9be57

The return value of fwrite() for this chunk is NULL.
 [2021-10-13 17:05 UTC] roland at nextendweb dot com
It looks like these are some kind of special systems which are unable to write the following data to the filesystem and this is why that chunk is missing always.

https://gist.github.com/nextend/91b09c70a5a86fff34bcf87e29c9f068

Output is NULL and the file is not created in the filesystem. No error thrown.
 [2021-10-13 23:19 UTC] requinix@php.net
Weird. What happens if you have PHP output the data instead and then pipe that into cat? Or use some other methods of writing out to a file?

<?php
$value = '...';
echo substr(base64_decode($value), 0, 1100);
?>

$ php nowrite.php | cat > b.txt
 [2021-10-14 06:31 UTC] roland at nextendweb dot com
My latest finding that the common in these servers that all of them has a PHP module called i360 which is a security module developed by https://www.imunify360.com/

It seems like that their module identifies that given chunk as a threat. I get in touch with the developer of that module and I will update this thread...
 [2021-10-14 16:53 UTC] roland at nextendweb dot com
Imunify360 team confirmed this problem in their filters:

"Developers confirm that the Proactive Defence rule 10006 creates false positives and blocks write operations. We are currently working on a solution.
Until the solution is available, please temporarily whitelist rule 10006 in Imunify360 > Proactive Defense > Detected Events > Actions."

I think this bug can be closed.
 [2021-10-14 20:02 UTC] requinix@php.net
-Status: Feedback +Status: Not a bug
 [2021-10-14 20:02 UTC] requinix@php.net
Thanks for finding that out!
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 21 12:01:31 2024 UTC