|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
[2006-12-26 07:24 UTC] imacat at mail dot imacat dot idv dot tw
[2006-12-26 09:48 UTC] tony2001@php.net
[2006-12-26 15:06 UTC] imacat at mail dot imacat dot idv dot tw
|
|||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Mon Dec 01 17:00:01 2025 UTC |
Description: ------------ Hi. This is imacat from Taiwan. I experienced PCRE failure after long matches. It doesn't seems to pass the 50000 match limit, UTF-8 or not. However, looking into the included PCRE library directory I saw no such limit anywhere. In config0.m4 the setting is -DMATCH_LIMIT=10000000. In pcrelib/README it states that the default of --with-match-limit is 500000. In php.ini I saw pcre.backtrack_limit=100000. Whatever I saw are far less than the 50000 match limit. This is hard to me since I have several articles to be parsed that's of size over 50000 bytes/characters. Reproduce code: --------------- #! /usr/bin/php <?php echo "=== Test #01 Non-UTF-8\n"; $a = str_repeat("a", 49997); echo "1. \"a\" repeated 49997 times\n"; printf(" strlen(): %6d, preg_match(): %d\n", strlen($a), preg_match("/^(.*?)\s*$/us", $a)); $a = str_repeat("a", 49998); echo "2. \"a\" repeated 49998 times\n"; printf(" strlen(): %6d, preg_match(): %d\n", strlen($a), preg_match("/^(.*?)\s*$/us", $a)); echo "=== Test #02 UTF-8\n"; $a = str_repeat("\xE4\xB8\x80", 49997); echo "1. \"\\xE4\\xB8\\x80\" repeated 49997 times\n"; printf(" strlen(): %6d, preg_match(): %d\n", strlen($a), preg_match("/^(.*?)\s*$/us", $a)); $a = str_repeat("\xE4\xB8\x80", 49998); echo "2. \"\\xE4\\xB8\\x80\" repeated 49998 times\n"; printf(" strlen(): %6d, preg_match(): %d\n", strlen($a), preg_match("/^(.*?)\s*$/us", $a)); ?> Expected result: ---------------- === Test #01 Non-UTF-8 1. "a" repeated 49997 times strlen(): 49997, preg_match(): 1 2. "a" repeated 49998 times strlen(): 49998, preg_match(): 0 === Test #02 UTF-8 1. "\xE4\xB8\x80" repeated 49997 times strlen(): 149991, preg_match(): 1 2. "\xE4\xB8\x80" repeated 49998 times strlen(): 149994, preg_match(): 0 Actual result: -------------- === Test #01 Non-UTF-8 1. "a" repeated 49997 times strlen(): 49997, preg_match(): 1 2. "a" repeated 49998 times strlen(): 49998, preg_match(): 1 === Test #02 UTF-8 1. "\xE4\xB8\x80" repeated 49997 times strlen(): 149991, preg_match(): 1 2. "\xE4\xB8\x80" repeated 49998 times strlen(): 149994, preg_match(): 1