php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #75488 Under heavy load PHP misses Opcache hits with no errors
Submitted: 2017-11-06 03:20 UTC Modified: 2018-05-07 14:13 UTC
Votes:4
Avg. Score:4.2 ± 0.8
Reproduced:2 of 2 (100.0%)
Same Version:2 (100.0%)
Same OS:2 (100.0%)
From: david at davidfavor dot com Assigned:
Status: Open Package: opcache
PHP Version: 7.2.5 OS: Ubuntu Artful
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: david at davidfavor dot com
New email:
PHP Version: OS:

 

 [2017-11-06 03:20 UTC] david at davidfavor dot com
Description:
------------
1) Create a simple hello.php (Hello World) file.

2) Run the following...

while : ; do curl -s -I -L https://foo.com/hello.php >/dev/null ; done

3) Track FPM access log.

First all's well. 100% cache hits + FPM access log shows 0.00% CPU time for script execution.

Then odd behavior begins.

First huge CPU time number begin, ranging from 1-10+ seconds.

Looking at Opcache stats, 99% hits for a few minutes.

Then cache hit rate begins to drop also.

I've setup opcache logging with log verbosity == 4 + nothing useful comes out.

No Apache to FPM communication errors logged.

No FPM errors logged (running in at debug level).

No FPM slow log entries.

I've gone through Opcache config + see no setting which might explain this.

I'm using the ondrej:php + ondrej:apache2 PPAs so,

Apache-2.4.29-0.1 + PHP-7.1.11-1 are my versions.

My... Command Line Kung Fu is strong... so if someone can suggest how to debug this deeper, let me know.

Test script:
---------------
<html>
 <head>
  <title>PHP Test</title>
 </head>
 <body>
 <?php echo '<p>Hello World</p>'; ?> 
 </body>
</html>

Expected result:
----------------
Expected result 100% cache hits.

Actual result:
--------------
Roughly 1 out of 30ish requests fail to come from cache.

Patches

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2017-11-11 16:39 UTC] david at davidfavor dot com
Steps to reproduce.

Config settings + opcache data links...

http://144.217.34.1/wptools/phpinfo.php
http://144.217.34.1/wptools/ocp-0.2.0.php

1) set opcache.ini values to never revalidate
#opcache.revalidate_freq=0
#opcache.validate_timestamps=0
#opcache.enable_file_override=0

lxd: net11-david-favor # service php7.1-fpm restart

lxd: net11-david-favor # truncate -s 0 /var/log/php7.1-fpm/access.log

2) Initiate a simple load test...

lxd: net11-david-favor # h2load -ph2c -t16 -c16 -m16 -n200000 https://net11.heartbeat.davidfavor.com/hello.php
... ... ...
finished in 11.29s, 17719.92 req/s, 2.06MB/s
requests: 200000 total, 200000 started, 200000 done, 200000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 200000 2xx, 0 3xx, 0 4xx, 0 5xx

This works as expected. 200000 started. 200000 successes. 0 failures.

3) Count log entries...

lxd: net11-david-favor # egrep -e hello.php /var/log/php7.1-fpm/access.log | wc -l
200000

4) Count log entries != 0.00%, so this could shows non-cached PHP executions. This count should be 1, as the .php file should run once + then be cached + come out of cache forever, since revalidate is off.

lxd: net11-david-favor # egrep -e hello.php /var/log/php7.1-fpm/access.log | egrep -v -e " 0.00%" | wc -l
5098

5) Now check opcache data...

Opcache shows 2 misses + 195K hits, which is wrong.

Should be 5098 misses + 194K hits.

This problem is reproducible 100% of the time.

6) After thought.

Now 3% breakage on 200K requests might seem inconsequential.

The problem occurs on high traffic sites, so say 100+ WordPress post slugs with sustained traffic of 50K-100K/second visits.

These are the numbers on the client site I host which surfaced the problem.

When this breakage begins on various page requests, which involve man .php files, the entire site begins to circle the drain quickly. Here's an example of why.

Consider these crazy numbers from a random snippet of FPM log entries.

1) hello.php 5.863 2048 170.56%
2) hello.php 1.468 2048 681.20%
3) hello.php 1.695 2048 589.97%
4) hello.php 0.217 2048 4608.29%
5) hello.php 0.228 2048 4385.96%
6) hello.php 0.148 2048 0.00%

These entries show milliseconds to execute + memory usage + CPU usage.

Item #1 - Seriously bad 5.8 seconds to finish, with low CPU usage. This means an FPM thread is tied up for 5.8 seconds, so site death will surely occur, with high traffic.

Item #4 + #5 - FPM threads finish quickly + they hammer the CPU.

Item #6 - What should occur with every request. Fast execution time + 0.00% CPU usage, as the only operation should be an opcache lookup + data return.

7) WordPress workaround.

In WordPress the work around is to use a caching plugin which utilizes mod_rewrite correctly.

Currently WP Super Cache is broken, so files are returned by Apache from file cache, then the original request leaks through to PHP, so site dies because of this bug.

WP Fastest Cache does fix this problem, because files are returned by Apache from file cache + them short circuited/stopped, so request terminates without hitting PHP.

High traffic PHP sites must work around this bug to survive right now, which requires a good bit of expertise to resolve for non-WordPress sites, which have no simple mod_rewrite caching options.

I'm happy to provide any developer with root ssh access to the LXD container where this bug can be worked + fixed.
 [2017-11-29 15:34 UTC] david at davidfavor dot com
-PHP Version: 7.1.11 +PHP Version: 7.1.12
 [2017-11-29 15:34 UTC] david at davidfavor dot com
I've bumped version from 7.11 to 7.12, as problem persists.

lxd: net11-david-favor # php --version
PHP 7.1.12-1+ubuntu17.04.1+deb.sury.org+1 (cli) (built: Nov 29 2017 10:04:01) ( NTS )
Copyright (c) 1997-2017 The PHP Group
Zend Engine v3.1.0, Copyright (c) 1998-2017 Zend Technologies
    with Zend OPcache v7.1.12-1+ubuntu17.04.1+deb.sury.org+1, Copyright (c) 1999-2017, by Zend Technologies
    with Xdebug v2.5.5, Copyright (c) 2002-2017, by Derick Rethans
 [2017-12-06 18:14 UTC] david at davidfavor dot com
-PHP Version: 7.1.12 +PHP Version: 7.2.0
 [2017-12-06 18:14 UTC] david at davidfavor dot com
Bumping version from 7.1.12 to 7.2.0 as problem persists.

lxd: net11-ubuntu-zesty-php72 # php --version
PHP 7.2.0-1+ubuntu17.04.1+deb.sury.org+1 (cli) (built: Nov 30 2017 13:59:08) ( NTS )
Copyright (c) 1997-2017 The PHP Group
Zend Engine v3.2.0, Copyright (c) 1998-2017 Zend Technologies
    with Zend OPcache v7.2.0-1+ubuntu17.04.1+deb.sury.org+1, Copyright (c) 1999-2017, by Zend Technologies
 [2017-12-06 18:18 UTC] david at davidfavor dot com
IP has changed to 144.217.34.13 for PHP-7.2 container.
 [2018-01-15 16:05 UTC] david at davidfavor dot com
-PHP Version: 7.2.0 +PHP Version: 7.2.1
 [2018-01-15 16:05 UTC] david at davidfavor dot com
Bumping version to 7.2.1 as problem still exits.
 [2018-01-15 16:06 UTC] david at davidfavor dot com
This is a huge problem for high traffic sites using PHP for dynamic content generation.

Be great if someone knowledgable about Opcache could take a look at this + determine a fix or work around.

Thanks.
 [2018-03-07 13:51 UTC] david at davidfavor dot com
-Operating System: Ubuntu Zesty +Operating System: Ubuntu Artful -PHP Version: 7.2.1 +PHP Version: 7.2.2
 [2018-03-07 13:51 UTC] david at davidfavor dot com
Problem still exists.

Bumping PHP + OS versions.
 [2018-03-26 14:28 UTC] david at davidfavor dot com
-PHP Version: 7.2.2 +PHP Version: 7.2.3
 [2018-03-26 14:28 UTC] david at davidfavor dot com
Bug persists.

Bumping PHP level.
 [2018-04-27 15:28 UTC] nikic@php.net
It's not immediately obvious to me that there is any bug here. Why do you think that any log entry without 0.00% CPU usage indicates an opcache miss? Execution time and CPU load vary and your 99-percentile load is going to be significantly higher than your average load. Especially as opcache itself only reports 2 misses, I'd assume that at least this particular case is simply due to statistical variation.

Now, the actual production case you describe does look like an actual problem. This kind of behavior should not occur during normal operation, but this would be fairly typical during an opcache restart. Do these occurrences coincide with anything that could trigger an opcache reset (files being deployed, fpm reloaded, etc.)?
 [2018-05-07 14:12 UTC] david at davidfavor dot com
When Opcache is working correctly CPU usage should be near 0%, as only a lookup is done + code returned.

When Opcache shows 100% load, Opcache is failing. This can easily be seen on machines running massive disk I/O as the time required to load + compile PHP can be many seconds.

I'd expect Opcache to return 0% all the time, it it were working correctly.
 [2018-05-07 14:13 UTC] david at davidfavor dot com
-PHP Version: 7.2.3 +PHP Version: 7.2.5
 [2018-05-07 14:13 UTC] david at davidfavor dot com
Bumping version to 7.2.5 as problem persists.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sun Sep 08 06:01:27 2024 UTC