php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #72640 Regexp with possessive/atomic grouping 25x slower compared to 5.6.x
Submitted: 2016-07-21 13:10 UTC Modified: 2016-07-21 22:20 UTC
From: arjen at parse dot nl Assigned:
Status: Not a bug Package: PCRE related
PHP Version: 7.0.9 OS: Linux
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: arjen at parse dot nl
New email:
PHP Version: OS:

 

 [2016-07-21 13:10 UTC] arjen at parse dot nl
Description:
------------
After reading http://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016 we checked our own implementation. We did not use possessive or atomic grouping in our trim regexps, so one trim call with 20000 whitespace char and a following non-whitespace char took 6 seconds. After updating the regexp this dropped to 2 seconds. Better but still not safe enough..

So we checked the performance in different php versions on 3v4l.org and 5.6 appeared 25x faster: https://3v4l.org/SPEXB/perf#tabs
Avg 0.066 for 5.6.x compared to 1.670 for 7.0.x

Test script:
---------------
<?php

preg_replace('~(?>[\pZ\pC]+)$~u', '', str_repeat("\u{2029}", 20000) .  'x');

Expected result:
----------------
Run-time comparable to 5.6.x

Actual result:
--------------
25x slowdown.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2016-07-21 13:57 UTC] cmb@php.net
-Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb
 [2016-07-21 13:57 UTC] cmb@php.net
Having a closer look at the performance result shows that PHP
7.0.1 - 7.0.5 appear to be really slow, but that current versions
are not affected, so I'm closing this ticket.
 [2016-07-21 14:02 UTC] arjen at parse dot nl
Please take a look at the user time, not at system time.

Or confirm by running it at CLI:

time php -r 'preg_replace("~(?>[\pZ\pC]+)$~u", "", str_repeat("\u{2029}", 20000) .  "x");'

real	0m0.975s
user	0m0.973s
sys	0m0.000s
 [2016-07-21 14:22 UTC] cmb@php.net
-Status: Not a bug +Status: Re-Opened -Assigned To: cmb +Assigned To:
 [2016-07-21 14:22 UTC] cmb@php.net
> Please take a look at the user time, not at system time.

Ah, you're right!
 [2016-07-21 14:54 UTC] arjen at parse dot nl
\u{2029} is a PHP7 feature, so test in 5.6 is not the same.. :-/

See https://3v4l.org/7Wfc5/perf#tabs

PHP7 is faster. Regexp still slow, but no regression.

Please close :X
 [2016-07-21 14:58 UTC] bwoebi@php.net
-Status: Re-Opened +Status: Not a bug
 [2016-07-21 14:58 UTC] bwoebi@php.net
I recommend using (*SKIP) though - this is the fastest you'll get.

preg_replace('~[\pZ\pC]+(*SKIP)$~u', '', str_repeat("\u{2029}", 20000) .  'x');
 [2016-07-21 22:18 UTC] leigh@php.net
-Status: Not a bug +Status: Open -Package: PCRE related +Package: Performance problem
 [2016-07-21 22:18 UTC] leigh@php.net
Not PCRE related.

Running it without the concat of 'x' results in normal performance:

https://3v4l.org/QbJP1/perf#tabs

<?php

preg_replace('~(?>[\pZ\pC]+)$~u', '', str_repeat("\u{2029}", 20000));
 [2016-07-21 22:19 UTC] leigh@php.net
-Status: Open +Status: Closed -Package: Performance problem +Package: PCRE related -Assigned To: +Assigned To: leigh
 [2016-07-21 22:19 UTC] leigh@php.net
Ignore me. Backref. I'm an idiot.
 [2016-07-21 22:20 UTC] leigh@php.net
-Status: Closed +Status: Not a bug -Assigned To: leigh +Assigned To:
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Thu Jul 03 15:01:34 2025 UTC