Web Developer / Blog

May
29th, 2008

Asynchronous/parallel HTTP requests using PHP multi_curl

Digg this article · Save to del.icio.us · Stumble it!

When working with web services curl quickly becomes your best friend. It gets even better when you dig into PHP’s multi_curl functions. The downside to accessing web services at run time is that HTTP connections can be slow. This problem is multiplied when you have to call multiple web services for a given page. PHP’s multi_curl_* functions help drastically because they allow you to make unblocking asynchronous/parallel requests. This means you can continue processing the request without waiting for a response.

Most of the tutorials online show examples of parallel curl requests using the same pattern. Fire of a handful of requests and then block later on until all the requests have completed. But what if you don’t need all the information right away? Perhaps the response you need is available immediately and another request is still waiting. Wouldn’t it be nice to get what you need when you need it? Of course it would.

  1. Execute curl requests as needed
  2. Access responses as needed
  3. Wrap this functionality in an easy to use class
  4. Offer a consistent interface to access the response
  5. Show a working example

Execute curl requests as needed
PHP’s multi_curl_init acts as a container for one or more curl handles created by curl_init. It also lets you run them in parallel and continue processing other PHP code. You can call curl_multi_exec at any time to fire off any curl handle in the stack which haven’t been yet. The second parameter to this function is passed by reference and returns a reference to a flag to tell if there are operations still running.

$mch = curl_multi_init();

$ch1 = curl_init('http://www.yahoo.com');
curl_setopt($ch1, CURLOPT_RETURNTRANSFER);
$ch2 = curl_init('http://www.google.com');
curl_setopt($ch2, CURLOPT_RETURNTRANSFER);

curl_multi_add_handle($mch, $ch1);
curl_multi_exec($mch, $active);

//

curl_multi_add_handle($mch, $ch2);
curl_multi_exec($mch, $active);

//

do{
  curl_multi_exec($mch, $active);
}while($active > 0);

$resp1 = curl_multi_getcontent($ch1);
$resp2 = curl_multi_getcontent($ch2);

Access responses as needed
The above code is a huge improvement from blocking for both curl requests to Yahoo! and Google. But say Google was being slow and you needed the response from Yahoo! first? The above code would force you to wait for the response from Google before you could use the response from Yahoo!. We can use the 2nd parameter to curl_multi_exec to let us know if there are any completed responses. What we can do is to check $active each time the do while loop processes and store any response received. If the response received is the one we’re looking for then we can simply exit the loop.

do{
  curl_multi_exec($mch, $active);
  if($active != $previousActive){
    // new response to save
    // if this is the response we were looking for then exit the loop
  }
  $previousActive = $active;
}while($active > 0);

Wrap this functionality in an easy to use class
It turns out that all you need to do is manage your curl handles. In order to do this we’re going to make the curl wrapper class a singleton. Additionally, we will create another class to manage the curl handles. When adding a curl handle we are going to return an instance of this manager class after having instantiated it with a unique identifier. The unique identifier we’re going to use is the string value of the curl handle (string)curl_init(). We’ll get into the code later.

Offer a consistent interface to access the response
In order to offer a consistent interface we are going define a few member variables for the manager class. For simplicity sake we will start with data for the response and code for the HTTP status code. Instead of initializing these with the object we can use PHP’s __get magic method. Now the first time we access $manager->data it will call the __get method. In the get method we will do the blocking and wait for the response. Once the response is received we’ll store it in case it’s accessed again later. I am not going into the details of this code as it should be self explanatory with the notes above.

Source also available at GitHub.

class EpiCurl
{
  const timeout = 3;
  static $inst = null;
  static $singleton = 0;
  private $mc;
  private $msgs;
  private $running;
  private $requests = array();
  private $responses = array();
  private $properties = array();

  function __construct()
  {
    if(self::$singleton == 0)
    {
      throw new Exception('You must instantiate it using: $obj = EpiCurl::getInstance();');
    }

    $this->mc = curl_multi_init();
    $this->properties = array(
      'code'  => CURLINFO_HTTP_CODE,
      'time'  => CURLINFO_TOTAL_TIME,
      'length'=> CURLINFO_CONTENT_LENGTH_DOWNLOAD,
      'type'  => CURLINFO_CONTENT_TYPE
      );
  }

  public function addCurl($ch)
  {
    $key = (string)$ch;
    $this->requests[$key] = $ch;

    $res = curl_multi_add_handle($this->mc, $ch);
    if($res == 0)
    {
      curl_multi_exec($this->mc, $active);
      return new EpiCurlManager($key);
    }
    else
    {
      return $res;
    }
  }

  public function getResult($key = null)
  {
    if($key != null)
    {
      if(isset($this->responses[$key]))
      {
        return $this->responses[$key];
      }

      $running = null;
      do
      {
        $resp = curl_multi_exec($this->mc, $runningCurrent);
        if($running !== null && $runningCurrent != $running)
        {
          $this->storeResponses($key);
          if(isset($this->responses[$key]))
          {
            return $this->responses[$key];
          }
        }
        $running = $runningCurrent;
      }while($runningCurrent > 0);
    }

    return false;
  }

  private function storeResponses()
  {
    while($done = curl_multi_info_read($this->mc))
    {
      $key = (string)$done['handle'];
      $this->responses[$key]['data'] = curl_multi_getcontent($done['handle']);
      foreach($this->properties as $name => $const)
      {
        $this->responses[$key][$name] = curl_getinfo($done['handle'], $const);
        curl_multi_remove_handle($this->mc, $done['handle']);
      }
    }
  }

  static function getInstance()
  {
    if(self::$inst == null)
    {
      self::$singleton = 1;
      self::$inst = new EpiCurl();
    }

    return self::$inst;
  }
}

class EpiCurlManager
{
  private $key;
  private $epiCurl;

  function __construct($key)
  {
    $this->key = $key;
    $this->epiCurl = EpiCurl::getInstance();
  }

  function __get($name)
  {
    $responses = $this->epiCurl->getResult($this->key);
    return $responses[$name];
  }
}

Show a working example
Here is how it looks to implement. It’s very clean and consistent…two of the goals we set out for. If you have any questions then let me know in the comments.

include 'EpiCurl.php';
$mc = EpiCurl::getInstance();

$ch1 = curl_init('http://www.yahoo.com');
curl_setopt($ch1, CURLOPT_RETURNTRANSFER, 1);
$curl1 = $mc->addCurl($ch1);

// connect to a database
// loop over some records
// authenticate a user

$ch2 = curl_init('http://www.google.com');
curl_setopt($ch2, CURLOPT_RETURNTRANSFER, 1);
$curl2 = $mc->addCurl($ch2);

// open a file
// loop over the lines in the file
// close the file

$ch3 = curl_init('http://www.slooooooooooooooooow.com');
curl_setopt($ch3, CURLOPT_RETURNTRANSFER, 1);
$curl3 = $mc->addCurl($ch3);

echo "Response code from Yahoo! is {$curl1->code}\n";
echo "Response code from Google is {$curl2->code}\n";

Resources

24 Responses to “Asynchronous/parallel HTTP requests using PHP multi_curl”

  1. Raul Says:

    Hi,
    I have read your samples, and i have a question, supose i need to make asynchronous calls to a url no matter what the response is, i just need to process the url several times, what would you recomend?.

  2. jaisen Says:

    @Raul,

    You could do it in the same manner as you would if you were calling different urls. You won’t be guaranteed the order in which the requests are processed though.

    In the last sample code simply replace yahoo.com, google.com and slooow.com urls with identical urls.

  3. Alistair Says:

    I believe there is an error in the script, as it was failing on my server. The problem is at the following if statement:

    $res = curl_multi_add_handle($this->mc, $ch);
    if($res == 0)
    {
    curl_multi_exec($this->mc, $active);

    One of the responses that can be returned by curl_multi_handle is ‘-1′. This is described in the CURL error list (http://curl.haxx.se/libcurl/c/libcurl-errors.html) as:

    “CURLM_CALL_MULTI_PERFORM (-1)

    This is not really an error. It means you should call curl_multi_perform(3) again without doing select() or similar in between. ”

    I don’t think the script should fail on this ‘error’ as it is not really an error. Changing the if statement to execute if a -1 is discovered fixes the problem. NB this does not appear to be a problem with all versions of PHP.

  4. jaisen Says:

    @Alistair,

    Thanks for pointing that out. Here are the changes I made as a result.


    if($res === CURLM_OK || $res === CURLM_CALL_MULTI_PERFORM)
    {
    do {
    $mrc = curl_multi_exec($this->mc, $active);
    } while ($mrc === CURLM_CALL_MULTI_PERFORM);

    Is that similar to your fix?

    Can I put your name/email in the notes for credits for the fix in the source?

  5. Alistair Says:

    Yes, that’s pretty much what I am now doing.

    I also think it would be good practice to close the curl handles after they are removed from multi-CURL as:

    “When a single transfer is completed, the easy handle is still left added to the multi stack. You need to first remove the easy handle with url_multi_remove_handle(3) and then close it with curl_easy_cleanup(3), or possibly set new options to it and add it again with curl_multi_add_handle(3) to start another transfer.”
    (source http://curl.haxx.se/libcurl/c/libcurl-multi.html)

    Re: credits. Feel free to put in my name, but not email please.

  6. Jonny 5 Says:

    Just wanted to point out:

    curl_multi_info_read = PHP 5.2+
    http://at.php.net/curl_multi_info_read

  7. Jonny 5 Says:

    Hello again,

    first, thanks a lot for your work :)

    I had some problems with memory overflow, when running in php-cli. I’m not experienced with classes, so please correct me.

    My own function uses class EpiCurl to pull X sources. After that I call destruct, which I added to EpiCurl:

    function __destruct()
    {
    // close handles
    foreach($this->requests AS $ch){
    curl_close($ch);
    }

    // close multi handle
    if(curl_multi_info_read($this->mc)){
    curl_multi_close($this->mc);
    }

    // unset misc
    unset(
    $this->requests,
    $this->responses,
    $this->properties,
    $this->running,
    $this->mc
    );

    // reset class
    self::__construct();
    }

    regards, Robert

  8. Alan H Says:

    Cool code.

    I’m having a weird problem with it though:

    I have a server application which I call several times in a loop process the responses later. The server expects username and password so I have a static wrapper class which sets all the required cURL options and then calls EpiCurl::getInstance followed by addUrl.

    This all works fine if I call in a loop, and then immediately process the results. If, however I simulate some processing by adding a sleep(5) or a big loop, then go process the responses the first response is an authentication error from the server.

    So: (inside my static class)

    static function wrapper(url)
    {
    $ch = curl_init();
    // set lots of cURL options inc…
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
    curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_DIGEST);
    curl_setopt($ch, CURLOPT_USERPWD, $username . “:” . $password);
    // etc
    $mc = EpiCurl::getInstance();
    return $mc->addCurl($ch);
    }

    Then: (in my client code)

    $cha = array();
    $urls = array(…urls…);
    foreach ($urls as $url)
    {
    $cha[] = staticclass::wrapper($url);
    }

    sleep(5);

    foreach ($cha as $ch)
    {
    print $ch->data;
    }

    Without the sleep(5) this works fine on each url. With the sleep(5) the first response is an authentication error.

    If I put $x = $cha[0]->code after the first loop (before the sleep(5)) then everything works fine.

    Any ideas? This has me stumped!

    Thanks,
    Alan

  9. jaisen Says:

    3 of my coworkers and I spent a few days trying to figure out why this was happening. In our case we had something which took 4 seconds in between making the request and getting the results. We tracked it down to a magic number of ~2 seconds which would break the curl requests. No guarantee that your problem is the same as ours but it sounds almost identical.

    What was happening on our end was that requests were not completely being sent to the server. It was establishing the connection but never sending the request. If we looked at tcpdump we noticed that we got an ACK then a FIN. So then it waits 4 seconds and by the time we come back to get our results curl has timed out.

    One major problem is that there’s no way to differentiate between data which needs to be sent from data which needs to be received. We want to block for the first but not for the second (unless we’re asking for said data). This isn’t a problem specifically with PHP’s curl. The underlying curl library doesn’t expose this nor do the system calls.

    I have been meaning to patch this. I will do that in the next day or two and then add a comment to this post.

    Thanks for the feedback…this bug was a pain in the tail!

  10. Scot Says:

    Would it be possible to use this library to get asynchronous notifications when reading a very large binary file? We want to be able to read a very large binary file and periodically process fixed length buffers read from this binary file.

  11. Josh Fraser Says:

    Thanks for sharing this example. I made my own variation of this that makes things a lot faster when you’re dealing with a large number of requests:

    http://onlineaspect.com/2009/01/26/how-to-use-curl_multi-without-blocking/

  12. Albert Says:

    Great library,

    I just want to add that this library only works properly with PHP 5.2 or greater.

    It didn’t work on a 5.1 and there were no errors, I blame it on the curl_multi_info_read.

  13. jaisen Says:

    @Josh, Thanks for commenting. However, I didn’t really see how it’s necessarily faster. Your approach of optionally throttling the requests is interesting, but could you expand on the benefits that I might not be thinking of? I’d love to incorporate any improvements into my libraries as well.

  14. Josh Fraser Says:

    @jaisen you’re right. it’s not faster. i saw your first example but somehow missed the part where you later eliminated the blocking.

    i optimized my code for processing 1000’s of requests as fast as possible. how well does your library handle a large number of requests at one time? from my experience, curl_multi fails without errors once you ask it to deal with around 200 or so simultaneous requests. i’d be curious to hear if you’ve run into that issue and if you have any insight on dealing with it. i was able to get around it via throttling, but the next step might be to add some intelligence to the window size using the current CPU usage, # of open connections, etc.

  15. jaisen Says:

    @Josh, I haven’t tried to process 1000s of requests. Actually, my use case has been to fire off requests, do some processing, and come back to get the requests (hoping that they’re finished). I can imagine PHP’s curl_multi failing if you happen to do that :).

    I added your post to the list of resources at the end of the post.

  16. Marc Says:

    Josh, great that you have tested this bug queue… but have you thought about overloading machines? You should not open more than 6 requests (RFC) to one domain/site at once.

  17. Josh Fraser Says:

    Marc,

    Great point. I’ve been using it for fetching blog posts from unique servers, but that’s a good reminder for all of us to remember our manners.

  18. jaisen Says:

    @Marc, @Josh, I haven’t used it in scenarios with 100+ concurrent requests but if that was the use case then a combination of configurable limits on the both ends would be a must have. Josh has 1/2 of this already implemented :).

  19. Konstantin Says:

    I think, we got a little mistake in function storeResponses() :
    curl_multi_remove_handle($this->mc, $done['handle']);
    not necessary do this inside a cycle by properties. I just put it outside after cycle.

    Also, I investigated Josh’s code at the page: http://onlineaspect.com/2009/01/26/how-to-use-curl_multi-without-blocking/
    And improve function storeResponses by adding before curl_multi_remove_handle.. cycle by not yet loaded requests and re-add its into curl_multi. Please, look at the code bellow:

    function storeResponses() {
    while (($done = curl_multi_info_read($this->mc))) {
    $ch = $done['handle'];
    $key = (string)$ch;
    $info = curl_getinfo($ch);
    if ($info['http_code'] != 200) {
    // errors handling
    }
    $this->responses[$key]['data'] = curl_multi_getcontent($ch);
    $this->responses[$key]['info'] = $info;
    foreach ($this->properties as $name => $const) {
    $this->responses[$key][$name] = curl_getinfo($ch, $const);
    }
    foreach ($this->requests as $rKey => $rCurlHandle) {
    if (!isset($this->responses[$rKey])) {
    curl_multi_add_handle($this->mc, $rCurlHandle);
    }
    }
    curl_multi_remove_handle($this->mc, $ch);
    curl_close($ch);
    }
    return true;
    }

    Now it works much faster with 100+ requests.

  20. How to quickly integrate with Twitter’s OAuth API using PHP « Dogfeeds——IT Telescope Says:

    [...] Reuse the asynchronous/non-blocking curl library [...]

  21. Riyaz Says:

    Hey Jaisen,
    First off, thanks a lot for the this class, it makes it very easy to use this open secret called curl. I had one question though - what time does the EpiCurl’s “time” property return? My page is taking 10+ secs to load, but the “time” value being returned is 1 second. And to top it, the “length” value is 0!

    Would appreciate any insight you have on how to measure curl calls…
    Thanks.

  22. Riyaz Says:

    Never mind… it turns out my code was using getimagesize() to resize image, and that was fetching each image from remote location. So, the 1 second “time” value was correct.

  23. Aaron Says:

    Nice example, thank you. This code looks as though it may have potential for what I am trying to accomplish which is perform an http request for xml files created via a database on another server. What would be the advantage of using multi_curl versus the http_request class/function in PHP?

  24. How to quickly integrate with Twitter’s OAuth API using PHP :: Jaisen Mathai Says:

    [...] Reuse the asynchronous/non-blocking curl library [...]

Leave a Reply


About this site:
This is my (Jaisen Mathai) personal site for potential employers who want to see my resume or portfolio. My ideal job would be to work as a Web developer on a large scale consumer website. My experience is in using PHP, MySQL, Ajax and JSON. I really enjoy creative brainstorming...taking a problem apart and narrowing 100 solutions down to the best one.

Thanks for stopping by. Be sure to drop me a line.