|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
[2021-05-23 15:53 UTC] danack@php.net
[2021-05-25 12:49 UTC] asmqb7 at gmail dot com
[2021-05-31 08:27 UTC] krakjoe@php.net
-Status: Open
+Status: Wont fix
[2021-05-31 08:27 UTC] krakjoe@php.net
[2021-06-01 15:26 UTC] asmqb7 at gmail dot com
|
|||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Sun Dec 07 15:00:01 2025 UTC |
Description: ------------ It's currently not possible to efficiently run a curl_multi request while simultaneously selecting on unrelated PHP streams. The only solution is to tick-tock between curl_multi_select() with an extremely small (or 0) timeout, and stream_select() with a small timeout (eg 2500us), in an infinite loop. This can be made to work for the most part with acceptable (but always nonzero) CPU usage, but magic number tuning is necessary to balance constant CPU wastage against network responsiveness. It's a generally broken situation that requires pulling in and learning how to coordinate a library like Swoole or Ev, or cracking one's knuckles and building upon stream_socket_open("ssl://...") to solve; you don't have any options if you want to stick with the baseline of what you can get just about everywhere with `apt install php-curl` or equivalent (if curl support isn't already bundled). In bug #80956 I suggested an unnecessary, braindead approach to solving this problem, which involved cramming PHP stream objects into a curl_multi_wait() function and hoping for the best. Considering cURL's provenance, it's easy to see in retrospect that it was kind of silly to assume that a better way did not exist (but I indeed did not initially exhaustively search the documentation). While digging into some specifics of how cURL works internally earlier today, I discovered the very interesting curl_multi_fdset() function: https://curl.se/libcurl/c/curl_multi_fdset.html This takes caller-allocated pointers to fd_set structures, which it fills (FD_SET()s) with the socket descriptors cURL is currently using internally. The whole point of this function is to support the calling application taking over the top-level select() loop. While poking around searching for references suggesting when curl_multi_fdset() was introduced (A Long Time Ago, I think), I stumbled by chance on a reference to an outdated copy of curl/ext/multi.c from PHP 7 **actually using curl_multi_fdset() as part of curl_multi_exec()**: https://github.com/php/php-src/blob/b4140bf64811b97af153a5d49a1d71677993a075/ext/curl/multi.c#L250 - it literally grabs the fds from cURL and selects on them from the PHP side. I'm not sure why PHP 8 no longer does this (https://github.com/php/php-src/blob/master/ext/curl/multi.c#L186) - perhaps because CURLM_MULTI_CALL_PERFORM was deprecated (https://curl.se/mail/lib-2012-08/0042.html). Assuming the old code had no issues, it's practically perfect; we just need... the select()... moved... into PHP userspace... :) To answer the discussion in bug #80956 regarding PHP not having an fd_set type, I think the most compatible, accessible, sane and "just make it work at all" way to PHP userspace would be to simply instantiate discrete PHP stream objects for each low-level FD_ISSET()-ed socket descriptor. The overhead incurred would not actually be the end of the world IMO. I would guess most realistic use-cases using a "lot" of connections are probably just managing maybe 20 open sockets. That's 60 stream objects (read+write+except). 50 open connections is 150 stream objects. 100 connections is 300 streams. It's certainly arguable this is not ideal, but also IMO effectively counter-arguable that things could be much much worse. Frameworks do terribler things millions of times a second :D (sic) HTTP/2 multiplexing may also translate some parallel downloading use-cases from the same or a small set of remotes to much lower numbers of connections. In the case where someone's doing something that translates to many hundreds of real sockets, and it's argued that that person's N+1st problem is that they're using PHP, and that PHP should generally not be expected to reasonably scale that far, the "doesn't reasonably scale" argument would translate in this case to "because it'll produce lots of stream object thrashing"... but that's not going to for example risk data corruption, it'll just make things slow down a bit because PHP would be creating lots of stream objects every loop. There is a very good question as to whether this would become noticeably quadratic, or only noticeable in (micro)benchmarks, buuuut, that discussion will likely only become concrete at the many-hundreds-of-connections point in the real world. Many use{rs, use cases} won't be affected by the stream (re)creation overhead. If things need further optimization badly enough under real-world load, an obvious way will either present itself or be hammered into existence :) Test script: --------------- An example of what a possible (currently non-existing as of May 2021) curl_multi_fdset() function might look like: Prototype: curl_multi_fdset(CurlMultiHandle $multi_handle, array &$read, array &$write, array &$except); Semantics: To be consistent with the majority of PHP functions that deal with multiple return values, this function would use argument pass-by-reference. Inserted streams would be appended to the supplied arrays by adding integer indices, and skipping over existing indices. So if array indices 1, 2 and 4 were already set, indices 0, 3, 5, 6, ... would be used. The types or values of the preexisting contents of the arrays would be ignored. If relevant, a small micro-optimization could be implemented, possibly in a later revision of the implementation, that used PHP internal knowledge of the passed arrays to determine whether the passed array only used integer indices, and was numerically consistent (eg [0, 1, 2, 3], without skipping any elements), and then short-circuit the index-skipping logic to pre-determine the first array index to begin blindly inserting at. I have absolutely no idea if this would ultimately speed anything up, or if I'm just naively bikeshedding the implementation here :) Example: This shows one (potentially non-optimal) example of how this would be used, distinguishing cURL sockets from application-specific streams. $cm = curl_multi_init(); for (;;) { $read = $this->read; // app-specific stream pools $write = $this->write; // (here represented as part of a class instance) $except = $this->except; curl_multi_fdset($cm, $read, $write, $except); // this does not exist yet as of May 2021 stream_select($read, $write, $except, NULL); foreach ($read as $stream) { if (!in_array($this->read, $fd, true)) { // if $stream is not in $this->read, // (curl socket) // it was added by curl_multi_fdset() } else { // app-specific stream } } // (repeat for $cwrite and $cexcept) } As I said, this is potentially not the best approach (given that stream_select() works with arrays of streams, is there a better approach than in_array() here?); suggestions and/or boilerplate code for how to best use the function would probably be worth headscratching over then putting into the documentation.