|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
PatchesPull Requests
Pull requests:
HistoryAllCommentsChangesGit/SVN commits
[2016-07-27 02:48 UTC] ju1ius at laposte dot net
[2016-07-27 09:57 UTC] nikic@php.net
[2016-08-20 13:33 UTC] cmb@php.net
-Summary: No support for PCRE2_NO_UTF_CHECK flag
+Summary: No support for PCRE_NO_UTF_CHECK flag
[2019-03-18 12:26 UTC] nikic@php.net
-Assigned To:
+Assigned To: nikic
[2019-03-18 12:26 UTC] nikic@php.net
[2019-03-18 15:59 UTC] nikic@php.net
[2019-03-18 15:59 UTC] nikic@php.net
-Status: Assigned
+Status: Closed
|
|||||||||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Mon Oct 27 07:00:01 2025 UTC |
Description: ------------ When matching in UTF8 mode ('u' flag), the `preg_match` function checks the validity of the entire input string on every call. PCRE2 has a flag to disable this behavior: `PCRE2_NO_UTF_CHECK`, but PHP does not expose it. This means that when matching a lot of times against the same (potentially long) input string (for example for lexical analysis), a lot of unnecessary computations are performed, leading to catastrophic performance (`O(n²)` instead of expected `O(n)`). Test script: --------------- // see online at https://3v4l.org/Duv7g $input_size = 1e4; $str = str_repeat('a', $input_size); $start = microtime(true); $pos = 0; while(preg_match('/\G\w/', $str, $m, 0, $pos)) ++$pos; $end = microtime(true); echo '>>> NO u flag: ', number_format(($end - $start)*1000, 6), 'ms', PHP_EOL; $str = str_repeat('e', $input_size); $start = microtime(true); $pos = 0; while(preg_match('/\G\w/u', $str, $m, 0, $pos)) ++$pos; $end = microtime(true); echo '>>> WITH u flag: ', number_format(($end - $start)*1000, 6), 'ms', PHP_EOL; Expected result: ---------------- I expect the two loops to take roughly the same amount of time, with the second one being slower by a very short margin. The performance should be `O(n)` for the two. Actual result: -------------- With $input_size === 1e4: >>> NO u flag: 9.632111ms >>> WITH u flag: 109.670877ms With $input_size === 1e5: >>> NO u flag: 96.043110ms >>> WITH u flag: 10,151.215076ms With $input_size === 2*1e5: >>> NO u flag: 188.354015ms >>> WITH u flag: 40,387.295008ms Looks like we have the expected `O(n)` in first case, and `O(n²)` with the /u flag.