php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #72640 Regexp with possessive/atomic grouping 25x slower compared to 5.6.x
Submitted: 2016-07-21 13:10 UTC Modified: 2016-07-21 22:20 UTC
From: arjen at parse dot nl Assigned:
Status: Not a bug Package: PCRE related
PHP Version: 7.0.9 OS: Linux
Private report: No CVE-ID: None
 [2016-07-21 13:10 UTC] arjen at parse dot nl
Description:
------------
After reading http://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016 we checked our own implementation. We did not use possessive or atomic grouping in our trim regexps, so one trim call with 20000 whitespace char and a following non-whitespace char took 6 seconds. After updating the regexp this dropped to 2 seconds. Better but still not safe enough..

So we checked the performance in different php versions on 3v4l.org and 5.6 appeared 25x faster: https://3v4l.org/SPEXB/perf#tabs
Avg 0.066 for 5.6.x compared to 1.670 for 7.0.x

Test script:
---------------
<?php

preg_replace('~(?>[\pZ\pC]+)$~u', '', str_repeat("\u{2029}", 20000) .  'x');

Expected result:
----------------
Run-time comparable to 5.6.x

Actual result:
--------------
25x slowdown.

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2016-07-21 13:57 UTC] cmb@php.net
-Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb
 [2016-07-21 13:57 UTC] cmb@php.net
Having a closer look at the performance result shows that PHP
7.0.1 - 7.0.5 appear to be really slow, but that current versions
are not affected, so I'm closing this ticket.
 [2016-07-21 14:02 UTC] arjen at parse dot nl
Please take a look at the user time, not at system time.

Or confirm by running it at CLI:

time php -r 'preg_replace("~(?>[\pZ\pC]+)$~u", "", str_repeat("\u{2029}", 20000) .  "x");'

real	0m0.975s
user	0m0.973s
sys	0m0.000s
 [2016-07-21 14:22 UTC] cmb@php.net
-Status: Not a bug +Status: Re-Opened -Assigned To: cmb +Assigned To:
 [2016-07-21 14:22 UTC] cmb@php.net
> Please take a look at the user time, not at system time.

Ah, you're right!
 [2016-07-21 14:54 UTC] arjen at parse dot nl
\u{2029} is a PHP7 feature, so test in 5.6 is not the same.. :-/

See https://3v4l.org/7Wfc5/perf#tabs

PHP7 is faster. Regexp still slow, but no regression.

Please close :X
 [2016-07-21 14:58 UTC] bwoebi@php.net
-Status: Re-Opened +Status: Not a bug
 [2016-07-21 14:58 UTC] bwoebi@php.net
I recommend using (*SKIP) though - this is the fastest you'll get.

preg_replace('~[\pZ\pC]+(*SKIP)$~u', '', str_repeat("\u{2029}", 20000) .  'x');
 [2016-07-21 22:18 UTC] leigh@php.net
-Status: Not a bug +Status: Open -Package: PCRE related +Package: Performance problem
 [2016-07-21 22:18 UTC] leigh@php.net
Not PCRE related.

Running it without the concat of 'x' results in normal performance:

https://3v4l.org/QbJP1/perf#tabs

<?php

preg_replace('~(?>[\pZ\pC]+)$~u', '', str_repeat("\u{2029}", 20000));
 [2016-07-21 22:19 UTC] leigh@php.net
-Status: Open +Status: Closed -Package: Performance problem +Package: PCRE related -Assigned To: +Assigned To: leigh
 [2016-07-21 22:19 UTC] leigh@php.net
Ignore me. Backref. I'm an idiot.
 [2016-07-21 22:20 UTC] leigh@php.net
-Status: Closed +Status: Not a bug -Assigned To: leigh +Assigned To:
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Apr 25 15:01:30 2024 UTC