php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #63842 Infinite loop in example code
Submitted: 2012-12-24 06:48 UTC Modified: 2013-02-17 21:58 UTC
From: dean at ovts dot com dot au Assigned:
Status: Not a bug Package: Documentation problem
PHP Version: Irrelevant OS:
Private report: No CVE-ID: None
 [2012-12-24 06:48 UTC] dean at ovts dot com dot au
Description:
------------
---
From manual page: http://www.php.net/function.curl-multi-exec
---

I think there's a bug in the sample code.

If curl_multi_select() returns -1, then the while{} loop just keeps calling it until it returns something else.

So if repeated calles to curl_multi_select() can return -1 indefinitely (and the comments on the manual page for that function suggest it can), then the example code will just go into an infinite loop.



Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2012-12-31 23:04 UTC] mail+php at requinix dot net
curl_multi_init()'s example has the same problem. Given the history of the 
underlying bug (see bug #63411, bug #61141) I'm hesitant to just go in and 
submit a change.

In the comments for 61141 are two code bits that should work, the preferred one 
(from pierrick) being

while ($active && $mrc == CURLM_OK) {
    if (curl_multi_select($mh) == -1) usleep(100);
    do { $mrc = curl_multi_exec($mh, $active); }
    while ($mrc == CURLM_CALL_MULTI_PERFORM);
}
 [2013-01-03 15:05 UTC] dean at ovts dot com dot au
That would certainly fix the inifinite loop problem.

I'm not clear on why you'd need to wait 100ms (or some other arbitrary amount of time) before calling curl_multi_exec again... this just seems like bad design to me (even if the libcurl docs recommend it).  How many times might you end up doing this 100ms sleep?  The whole point of curl_multi_select is to sleep exactly the minimum amount, and wake up as soon as some curl operation is ready to continue.

What would happen if you simply ignored the return value of curl_multi_select, and did this:

  while (curl_multi_exec($mh, $active) === CURLM_CALL_MULTI_PERFORM);
  do
  {
    curl_multi_select($curlMH);
    while (curl_multi_exec($curlMH, $active) === CURLM_CALL_MULTI_PERFORM);
  } while ($active);


(Also, do we need to break if curl_multi_exec doesn't return CURLM_OK, or can we just keep going until $active is false, as I've done above?  What would it mean if $active were true, but the return value wasn't CURLM_OK?)
 [2013-01-03 15:09 UTC] dean at ovts dot com dot au
Whoops... I meant to change all the $curlMH variables to $mh to match the previous code snippet, but I only did the first one.  So '$curlMH' in the above code should actually be '$mh'.  Sorry about that!
 [2013-01-16 18:31 UTC] googleguy@php.net
-Status: Open +Status: Not a bug
 [2013-01-16 18:31 UTC] googleguy@php.net
This is not a bug, because the condition of the loop is dependent upon 
curl_multi_select. If the curl_multi_select returns -1 the code within the loop 
body is simply not executed. This still does not create an infinite loop under 
normal conditions because eventually if all of the curl handles timeout they are 
freed from the select resource handle.
 [2013-01-16 18:31 UTC] googleguy@php.net
*Correction to my last comment

"This is not a bug, because the condition of the loop is NOT dependent upon 
curl_multi_select."
 [2013-01-16 19:14 UTC] dean at ovts dot com dot au
If the loop is entered and curl_multi_select then returns -1, the variables determining the loop condition remain unchanged, so curl_multi_select is simply called repeatedly until such time as it returns something other than -1.

Googleguy, I don't quite understand what you were saying:  Is it that eventually curl_multi_select is *GUARANTEED* to return something other than -1?  *IF* that's true then there's no infinite loop.

However, it's still very unclear exectly how/why this code is supposed to work.  I'd like to see an explanation/clarification in the docs of the return values from curl_multi_exec and curl_multi_select, and of the $active flag.  What exactly does a -1 returned from curl_multi_select imply?  And (again) what would it mean if $active were true, but the return value from curl_multi_exec wasn't CURLM_OK?

At the moment, without any such explanation in the docs, the example code is like an opaque magic formula for how curl_multi_exec should be used.  It's difficult to trust it, when there are many "alternative versions" posted on the internet, and some on other php docs pages too!
 [2013-01-16 19:21 UTC] dean at ovts dot com dot au
Googleguy, under  bug #61141, several people report that they do indeed get an infinite loop because curl_multi_select returns -1 indefinitely...
 [2013-02-07 19:02 UTC] karl at silverglassworks dot com
The current example is definitely an incorrect way of doing things, and was 
causing an infinite loop for me today.

From the cURL documentation 
(http://curl.haxx.se/libcurl/c/curl_multi_fdset.html):

******************************************************************************
When libcurl returns -1 in max_fd, it is because libcurl currently does 
something that isn't possible for your application to monitor with a socket and 
unfortunately you can then not know exactly when the current action is completed 
using select(). When max_fd returns with -1, you need to wait a while and then 
proceed and call curl_multi_perform anyway. How long to wait? I would suggest 
100 milliseconds at least, but you may want to test it out in your own 
particular conditions to find a suitable value.
******************************************************************************

So in the situation where libcurl can't use select(), it will always return -1, 
and thus the example will be an infinite loop, as I have been experiencing.  I 
changed my code to the following, and it now works:

    while ($active && $mrc == CURLM_OK) {
      if (curl_multi_select($mh) === -1) {
        usleep(100);
      }
      do {
        $mrc = curl_multi_exec($mh, $active);
      } while ($mrc == CURLM_CALL_MULTI_PERFORM);
    }
 [2013-02-17 16:53 UTC] dean at ovts dot com dot au
Well, I've looked over the relevant libcurl docs, and taken a look at how others on the internet are using curl_multi, and based on that, here's what I think represents best-practice, as example code for including in the PHP documentation pages:



$MAX_SIMULTANEOUS = 50;  // Adjust to whatever number of maximum simultaneous requests you think is appropriate.

// Fill array $reqList[] with whatever data you're using to generate your CURL requests,
// one request per array element, with consecutive indices starting at 0.
// note that requests at the END of the array will be started first.

if ($i = count($reqList)) // check that there is something to do...
{
  $handleMap = new SplObjectStorage;  // use this to associate curl handles with the data that created them
  $curlMH = curl_multi_init();

  $x = $i - $MAX_SIMULTANEOUS; if ($x<0) $x = 0;
  while ($i>$x) initRequest($reqList[--$i]);

  do
  {
    while (($mrc = curl_multi_exec($curlMH, $active)) === CURLM_CALL_MULTI_PERFORM);
    if ($mrc !== CURLM_OK) break;  // Shouldn't normally ever happen; look at the list of CURLM_ errors to see when it might.
    while ($info = curl_multi_info_read($curlMH))  // a request has completed
    {
      $ch = $info['handle'];
      if (($errNo = $info['result']) === CURLE_OK) processResult($handleMap[$ch], curl_multi_getcontent($ch));
        else ... // handle failed CURL request, (e.g. write to the error log or database, output a message, etc.)
      if ($i) initRequest($reqList[--$i]);  // start new request if one is waiting
      curl_multi_remove_handle($curlMH, $ch); curl_close($ch); $handleMap->detach($ch);  // clean up completed request
    }
    // wait for next CURL operation to complete, or sleep a short time if CURL is busy but unable to "block" using curl_multi_select
    if ($active && (curl_multi_select($curlMH) === -1)) usleep(50);
  }
  while ($active);

  if ($mrc !== CURLM_OK) ... // optinally handle CURLM_ errors (e.g. write to error log, output message, etc.)

  // clean up
  foreach ($handleMap as $ch) { curl_multi_remove_handle($curlMH, $ch); curl_close($ch); }
  curl_multi_close($curlMH);
  $reqList = $curlMH = $handleMap = null;
}

function initRequest($reqData)
{
  global $curlMH, $handleMap;

  // process $reqData to get $url (the request URL), and any POST data, or other CURL options.
  // also generate $custReqData, which will be passed to processResult to identify the request
  // when it has completed (may be the same as $reqData).

  $ch = curl_init($url);
  curl_setopt($ch, ...);   // set other CURL options
  curl_setopt($ch, ...);

  curl_multi_add_handle($curlMH, $ch);
  $handleMap[$ch] = $custReqData;
}


function processResult($custReqData, $response)
{
  // process $response (the data returned from the CURL request)
  // using $custReqData to identify which request it belongs to.
}
 [2013-02-17 21:58 UTC] dean at ovts dot com dot au
Oops... looks like you can't use splObjectStorage to store resources...

So, I fixed it by using an array instead.  I make use of the fact that ((string) $resource) is a string like "Resource id #123", where the "123" is a unique integer identifying a (current) resource. I trim the first 13 chars and convert the number part to an integer.  NOTE that this string format isn't guaranteed to remain unchanged in later versions of PHP, so you'd either have to check for a change after upgrading, or else you could just use the entire string as an array key (instead of just the integer part), at the cost of some efficiency.

Here's the fixed sample code (comments/discussion welcome!):




$MAX_SIMULTANEOUS = 50;  // Adjust to whatever number of maximum simultaneous requests you think is appropriate.

// Fill array $reqList[] with whatever data you're using to generate your CURL requests,
// one request per array element, with consecutive indices starting at 0.
// note that requests at the END of the array will be started first.

if ($i = count($reqList)) // check that there is something to do...
{
  $handleMap = array();  // use this to associate curl handles with the data that created them
  $curlMH = curl_multi_init();

  $x = $i - $MAX_SIMULTANEOUS; if ($x<0) $x = 0;
  while ($i>$x) initRequest($reqList[--$i]);

  do
  {
    while (($mrc = curl_multi_exec($curlMH, $active)) === CURLM_CALL_MULTI_PERFORM);
    if ($mrc !== CURLM_OK) break;  // Shouldn't normally ever happen; look at the list of CURLM_ errors to see when it might.
    while ($info = curl_multi_info_read($curlMH))  // a request has completed
    {
      $ch = $info['handle'];
      $k = intval(substr((string) $ch, 13));
      if (($errno = $info['result']) === CURLE_OK) processResult($handleMap[$k][1], curl_multi_getcontent($ch));
        else ... // handle failed CURL request, (e.g. write to the error log or database, output a message, etc.)
      if ($i) initRequest($reqList[--$i]);  // start new request if one is waiting
      curl_multi_remove_handle($curlMH, $ch); curl_close($ch); unset($handleMap[$k]);  // clean up completed request
    }
    // wait for next CURL operation to complete, or sleep a short time if CURL is busy but unable to "block" using curl_multi_select
    if ($active && (curl_multi_select($curlMH) === -1)) usleep(50);
  }
  while ($active);

  if ($mrc !== CURLM_OK) ... // optinally handle CURLM_ errors (e.g. write to error log, output message, etc.)

  // clean up
  foreach ($handleMap as $ch) { curl_multi_remove_handle($curlMH, $ch[0]); curl_close($ch); }
  curl_multi_close($curlMH);
  $reqList = $curlMH = $handleMap = null;
}

function initRequest($reqData)
{
  global $curlMH, $handleMap;

  // process $reqData to get $url (the request URL), and any POST data, or other CURL options.
  // also generate $custReqData, which will be passed to processResult to identify the request
  // when it has completed (may be the same as $reqData).

  $ch = curl_init($url);
  curl_setopt($ch, ...);   // set other CURL options
  curl_setopt($ch, ...);

  curl_multi_add_handle($curlMH, $ch);
  $handleMap[intval(substr((string) $ch, 13))] = array($ch, $custReqData);
}


function processResult($custReqData, $response)
{
  // process $response (the data returned from the CURL request)
  // using $custReqData to identify which request it belongs to.
}
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Mon Sep 16 04:01:28 2024 UTC