php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #73412 url-wrappers in PHP7 terrible slow
Submitted: 2016-10-28 14:06 UTC Modified: 2016-10-31 09:50 UTC
From: spam2 at rhsoft dot net Assigned:
Status: Not a bug Package: Performance problem
PHP Version: 7.0.12 OS: Linux
Private report: No CVE-ID: None
 [2016-10-28 14:06 UTC] spam2 at rhsoft dot net
Description:
------------
may someone explain me why PHP is that terrible slow with the test-script below doing 150 HTTP-calls on a local webserver where the 50 most expensive CMS requests take only 5037 us (the header which i want to parse out to measure the difference between cms, htm-file and htm-file with .php extension comes from httpd - Header always set "X-Response-Time" "%D us") 

40 seconds for 150 requests - on a machine which can "ab -c 20 -n 100000 http://localhost/" handle 1200 per second?

[harry@srv-rhsoft:/downloads]$ time php -r "get_headers('http://corecms/static.htm');"
real    0m0.040s
user    0m0.026s
sys     0m0.013s

[harry@srv-rhsoft:/downloads]$ time wget http://corecms/static.htm
real    0m0.006s
user    0m0.002s
sys     0m0.003s


Test script:
---------------
[harry@srv-rhsoft:/downloads]$ time php /www/corecms.rhsoft.net/response-times.php
COUNT:       50
DYNAMIC CMS: 5037 us
STATIC HTM:  346 us
STATIC PHP:  1645 us
FACTOR D/S:  14.6
FACTOR P/S:  4.8
FACTOR D/P:  3.1

real    0m39.557s
user    0m0.059s
sys     0m0.043s
________________________________________

[harry@srv-rhsoft:/downloads]$ cat /www/corecms.rhsoft.net/response-times.php
#!/usr/bin/php
<?php
 if(PHP_SAPI == 'cli')
 {
  $dynamic  = array();
  $static   = array();
  $staticp  = array();
  $requests = 50;

  for($count=1; $count<=$requests; $count++)
  {
   $dynamic[] = get_repsonse_time('http://corecms/index.php');
   $static[]  = get_repsonse_time('http://corecms/static.htm');
   $staticp[] = get_repsonse_time('http://corecms/static.php');
  }
  $dynamic_sum = array_sum($dynamic);
  $static_sum  = array_sum($static);
  $staticp_sum  = array_sum($staticp);
  $dynamic_avg = $dynamic_sum / count($dynamic);
  $static_avg  = $static_sum / count($static);
  $staticp_avg = $staticp_sum / count($staticp);
  echo 'COUNT:       ' . count($dynamic) . "\n";
  echo 'DYNAMIC CMS: ' . round($dynamic_avg) . " us\n";
  echo 'STATIC HTM:  ' . round($static_avg) . " us\n";
  echo 'STATIC PHP:  ' . round($staticp_avg) . " us\n";
  echo 'FACTOR D/S:  ' . round($dynamic_avg / $static_avg, 1) . "\n";
  echo 'FACTOR P/S:  ' . round($staticp_avg / $static_avg, 1) . "\n";
  echo 'FACTOR D/P:  ' . round($dynamic_avg / $staticp_avg, 1) . "\n";
 }

 function get_repsonse_time($url)
 {
  $response = get_headers($url, 1);
  $response = explode('D=', $response['X-Response-Time']);
  return (int)trim(str_replace(' us', '', $response[1]));
 }
?>


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2016-10-30 18:31 UTC] spam2 at rhsoft dot net
the same for file_get_contents, HEAD Requests or GET, the same with the curl extension is *magnitudes* faster and in fact this is a PHP/ regression because i just needed to rewrite some fuzzy-request scripts which would run the whole day long and originally had no problem with PHP5.x

in short: url-wrappers in PHP7 are broken and unuseable in case you have a noticebale amount of requests triggering them

$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $dst_url);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:49.0) Gecko/20100101 Firefox/49.0');
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'HEAD');
$buffer = curl_exec($ch);
curl_close($ch);
 [2016-10-30 18:33 UTC] spam2 at rhsoft dot net
-Summary: get_headers terrible slow +Summary: url-wrappers in PHP7 terrible slow
 [2016-10-30 18:33 UTC] spam2 at rhsoft dot net
changed subject
 [2016-10-30 18:56 UTC] spam2 at rhsoft dot net
just switch between curl and pure PHP and you have factor 57 by *just* 57 requests, that's ridiculous, especially since the curl stuff uses much more function calls and wiithin 36 secodns every sane software can make thousands of http requests
_______________________________

$use_url  = true;

real    0m0.647s
user    0m0.057s
sys     0m0.041s
_______________________________

$use_url  = false;

real    0m36.959s
user    0m0.047s
sys     0m0.051s
_______________________________

#!/usr/bin/php
<?php
 if(PHP_SAPI == 'cli')
 {
  $use_url  = false;
  $dynamic  = array();
  $static   = array();
  $staticp  = array();
  $requests = 50;

  for($count=1; $count<=$requests; $count++)
  {
   $dynamic[] = get_repsonse_time('http://corecms/index.php');
   $static[]  = get_repsonse_time('http://corecms/static.htm');
   $staticp[] = get_repsonse_time('http://corecms/static.php');
  }
  $dynamic_sum = array_sum($dynamic);
  $static_sum  = array_sum($static);
  $staticp_sum  = array_sum($staticp);
  $dynamic_avg = $dynamic_sum / count($dynamic);
  $static_avg  = $static_sum / count($static);
  $staticp_avg = $staticp_sum / count($staticp);
  echo 'COUNT:       ' . count($dynamic) . "\n";
  echo 'DYNAMIC CMS: ' . round($dynamic_avg) . " us\n";
  echo 'STATIC HTM:  ' . round($static_avg) . " us\n";
  echo 'STATIC PHP:  ' . round($staticp_avg) . " us\n";
  echo 'FACTOR D/S:  ' . round($dynamic_avg / $static_avg, 1) . "\n";
  echo 'FACTOR P/S:  ' . round($staticp_avg / $static_avg, 1) . "\n";
  echo 'FACTOR D/P:  ' . round($dynamic_avg / $staticp_avg, 1) . "\n";
 }

 function get_repsonse_time($url)
 {
  global $use_url;
  switch($use_url)
  {
   case false:
     $response = get_headers($url, 1);
     $response = explode('D=', $response['X-Response-Time']);
     return (int)trim(str_replace(' us', '', $response[1]));
     break;
   case true:
     $ch = curl_init();
     curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
     curl_setopt($ch, CURLOPT_URL, $url);
     curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
     curl_setopt($ch, CURLOPT_HEADER, true);
     curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'HEAD');
     $response = curl_exec($ch);
     curl_close($ch);
     $response = explode("\n", $response);
     foreach($response as $response_line)
     {
      if(strpos($response_line, 'X-Response-Time') !== false)
      {
       $response = explode('D=', $response_line);
       return (int)trim(str_replace(' us', '', $response[1]));
       break;
      }
     }
     break;
  }
 }
?>
 [2016-10-31 01:52 UTC] rasmus@php.net
-Status: Open +Status: Analyzed
 [2016-10-31 01:52 UTC] rasmus@php.net
I see a couple of issues with your sample code here. The biggest one being that you are not comparing the same things at all. get_headers(), as per the docs, uses a GET request by default. But your curl example uses a HEAD request. Obviously a HEAD request is going to be much quicker.

Your php get_headers() vs. wget timing isn't valid either since obviously php cli has way more startup work to do than wget.

I just wrote a quick test and I don't see a drastic difference between curl and the url wrappers on Linux. I got this output:

4.2943890094757
5.9716069698334

So yes, get_headers() is a bit slower, but certainly not orders of magnitude like you are claiming. This is my test script:

<?php
function curl_headers($url) {
    $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_NOBODY, true);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_HEADER, true);
    $r = curl_exec($ch);
    curl_close($ch);
    return explode("\n", trim($r));
}

$i=0;
$start = microtime(true);
while($i++<1000) {
    $headers = curl_headers('http://htpc/index.html');
}
echo (microtime(true) - $start)."\n";
$i=0;
stream_context_set_default(['http' => ['method' => 'HEAD']]);
$start = microtime(true);
while($i++<1000) {
    $headers = get_headers('http://htpc/index.html');
}
echo (microtime(true) - $start)."\n";
 [2016-10-31 05:21 UTC] spam2 at rhsoft dot net
Hi

thanks for feedback - please dont miss the part under the line - mabye that's the reason

we talk about 0.647s versus 36s for jsut 50 requests here while the libcurl variant takes 29s for FIVE THOUSAND requests and that's hardly a difference between GET/HEAD (BTW: why in the world does get_headers() make GET requests while http://php.net/manual/de/function.get-headers.php don't claim that as also common sense says GET by what it does and the fact that even with <Limit GET POST> you can't disable HEAD-Requests - point is it feels like doing 1 request per second which is not far from teh truth and i have  a script doing around 100000 requests which was fine with 5.x and pure file_get_contents()
____________________________________

is your PHP build a proper hardened build?

i saw as example when Fedora 23 switched the whole distribution https://fedoraproject.org/wiki/Changes/Harden_All_Packages grep go magnitued slower and now with F24 it's as fast as before and still a hardened build, so i guess there was some bug in the code which made it make that way

[root@srv-rhsoft:~]$ /usr/bin/hardening-check /usr/bin/php
/usr/bin/php:
 Position Independent Executable: yes
 Stack protected: yes
 Fortify Source functions: yes (some protected functions found)
 Read-only relocations: yes
 Immediate binding: yes

if that's the reason may i suggest developers only test with hardened builds because no one should run these days any non-PIE/non-FULLRELRO code except broken one which don't build proper and can't yet replaced
 [2016-10-31 05:47 UTC] rasmus@php.net
-Status: Analyzed +Status: Not a bug
 [2016-10-31 05:47 UTC] rasmus@php.net
No, I test clean PHP binaries. No 3rd-party hacks. And like I showed with my example, there is no major performance difference between url wrappers and curl.
 [2016-10-31 06:01 UTC] spam2 at rhsoft dot net
> No, I test clean PHP binaries. No 3rd-party hacks

when you think position independent executeable is a 3rd party hack why don't you handover the bugreport to someone who is serious instead close it?

all the PHP5 builds from the last years where PIE builds too without that performance problem
 [2016-10-31 06:04 UTC] rasmus@php.net
I assumed you meant http://www.hardened-php.net/
My build is using PIC and doesn't show any performance issues.
Do some work to figure out what is going on yourself. With such a drastic difference it should be obvious in a "perf record" profile.
 [2016-10-31 06:20 UTC] spam2 at rhsoft dot net
how can you assume http://www.hardened-php.net/ when i lead you to https://fedoraproject.org/wiki/Changes/Harden_All_Packages and /usr/bin/hardening-check and talking about PROPER?

is your php cli binary *really* PIE which is *not* the same like PIC

[root@srv-rhsoft:~]$ dnf info hardening-check
Letzte Prüfung auf abgelaufene Metadaten: vor 7:11:06 am Mon Oct 31 00:09:04 2016.
Installierte Pakete
Name        : hardening-check
Arch        : noarch
Epoch       : 0
Version     : 2.5
Release     : 4.fc24
Größe       : 37 k
Paketquelle : @System
Zusammenfas : Tool to check ELF for being built hardened
URL         : http://packages.debian.org/source/sid/hardening-wrapper
Lizenz      : GPLv2+
Beschreibun : hardening-check is a tool to check whether an already compiled ELF file
            : was built using hardening flags.
            :
            : It checks, using readelf, for these hardening characteristics:
            :
            :   * Position Independent Executable
            :   * Stack protected
            :   * Fortify source functions
            :   * Read-only relocations
            :   * Immediate binding
 [2016-10-31 06:33 UTC] rasmus@php.net
I ran the test on mod_php which is using libphp7.so which is compiled with PIC. PIC is the shared library equivalent of PIE.

But yes, as a matter of fact my cli php is position-independent as well:

11:28pm thinkpad:~/php-src> sapi/cli/php -v
PHP 7.0.14-dev (cli) (built: Oct 27 2016 21:27:10) ( NTS )
Copyright (c) 1997-2016 The PHP Group
Zend Engine v3.0.0, Copyright (c) 1998-2016 Zend Technologies
    with Zend OPcache v7.0.13-dev, Copyright (c) 1999-2016, by Zend Technologies

11:28pm thinkpad:~/php-src> hardening-check sapi/cli/php
sapi/cli/php:
 Position Independent Executable: yes
 Stack protected: yes
 Stack protected: yes
 Fortify Source functions: yes
 Read-only relocations: yes
 Immediate binding: yes

11:29pm thinkpad:~/php-src> sapi/cli/php /var/www/html/rr.php
4.9895119667053
5.8754270076752

And as you can see, from the cli the difference is still small. Did you actually run my script?
 [2016-10-31 07:35 UTC] spam2 at rhsoft dot net
same behavior with your code (replaced the URL and changed 1000 to 20 because otherwise i would need to wait until next week)

[harry@srv-rhsoft:/downloads]$ php test.php
0.081186056137085
3.1035280227661

the same when it's running within the webserver

interesting that the profiling script used between "make prof-gen" and "make prof-use" is also doing a ton of file_get_contents on a temporary started webserver on 127.0.0.1:9000 does not suffer from this problem (not within the build-process and also not with the fallback using the installed binaries) while i have serveral scripts using file_get_contents() or get_headers() which got magnitudes slower

one of them lists all files recursive and does blind requests to any possible URL to find includes which are not proper protected against direct calls and that one would take months to finish while with curl it's pretty fast
 [2016-10-31 07:44 UTC] nikic@php.net
Could you please provide an `strace` trace for what happens in the get_headers() case?
 [2016-10-31 08:03 UTC] rasmus@php.net
I think a perf report would be more useful, but if you do an strace, use these flags: strace -Ff -tt -T -o out.txt php -r "get_headers('http://corecms/static.htm');"
 [2016-10-31 08:10 UTC] spam2 at rhsoft dot net
i have no expierience with perf tools but if you would instruct me,,,

in the meantime two straces:

strace -Ff -tt -T -o out.txt php -r "get_headers('http://corecms/static.htm');"
http://access.thelounge.net/harry/get_headers_strace_small.txt

strace -Ff -tt -T -o out.txt php /www/corecms.rhsoft.net/response-times.php
http://access.thelounge.net/harry/get_headers_strace.txt
 [2016-10-31 08:15 UTC] nikic@php.net
Okay, so the trace includes a bunch of 3 second timeouts like

18606 09:07:06.754991 poll([{fd=3, events=POLLIN}], 1, 3000) = 0 (Timeout) <3.003119>

That is clearly the reason for the immense slowdown. Now need to figure out what the cause is :)
 [2016-10-31 08:22 UTC] spam2 at rhsoft dot net
since the script now has a $use_curl param where this does not happen and you can make 5000 requests instead 5 in the same time with nothing else changed hardly something in the environment or on the webserver which is contacted (local machine) pretty sure something wrong on the PHP side

a single call returns immediately
__________________________

[harry@srv-rhsoft:/mnt/data/downloads]$ time php -r "get_headers('http://corecms/index.php');"

real    0m0.057s
user    0m0.035s
sys     0m0.015s
[harry@srv-rhsoft:/mnt/data/downloads]$ time php -r "get_headers('http://corecms/static.htm');"

real    0m0.056s
user    0m0.041s
sys     0m0.013s
[harry@srv-rhsoft:/mnt/data/downloads]$ time php -r "get_headers('http://corecms/static.php');"

real    0m0.043s
user    0m0.032s
sys     0m0.009s
 [2016-10-31 09:22 UTC] nikic@php.net
Looking more closely, this is the relevant part of the trace:


18606 09:06:45.362957 socket(AF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 3 <0.000016>
18606 09:06:45.363009 connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 0 <0.000016>
18606 09:06:45.363062 gettimeofday({1477901205, 363080}, NULL) = 0 <0.000015>
18606 09:06:45.363116 poll([{fd=3, events=POLLOUT}], 1, 0) = 1 ([{fd=3, revents=POLLOUT}]) <0.000014>
18606 09:06:45.363175 sendto(3, "\377\306\1\0\0\1\0\0\0\0\0\0\7corecms\6rhsoft\3net\0"..., 36, MSG_NOSIGNAL, NULL, 0) = 36 <0.000028>
18606 09:06:45.363245 poll([{fd=3, events=POLLIN}], 1, 3000) = 0 (Timeout) <3.003097>

As you can see, a DNS request for corecms.rhsoft.net is made against 127.0.0.1:53 -- but presumably there is no DNS server actually running there. The question is why this happens in one case but not the other.
 [2016-10-31 09:50 UTC] spam2 at rhsoft dot net
looks like PHP can't handle a DNS server with response-rate-limiting proper while the curl extension does
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 21 15:01:29 2024 UTC