php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #39951 PCRE Failed for Long Matches (Less Than MATCH_LIMIT)
Submitted: 2006-12-26 07:20 UTC Modified: 2006-12-26 15:17 UTC
From: imacat at mail dot imacat dot idv dot tw Assigned:
Status: Not a bug Package: PCRE related
PHP Version: 5.2.0 OS: Linux 2.6.16.29
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: imacat at mail dot imacat dot idv dot tw
New email:
PHP Version: OS:

 

 [2006-12-26 07:20 UTC] imacat at mail dot imacat dot idv dot tw
Description:
------------
    Hi.  This is imacat from Taiwan.  I experienced PCRE failure after long matches.  It doesn't seems to pass the 50000 match limit, UTF-8 or not.  However, looking into the included PCRE library directory I saw no such limit anywhere.    In config0.m4 the setting is -DMATCH_LIMIT=10000000.  In pcrelib/README it states that the default of --with-match-limit is 500000.  In php.ini I saw pcre.backtrack_limit=100000.  Whatever I saw are far less than the 50000 match limit.

    This is hard to me since I have several articles to be parsed that's of size over 50000 bytes/characters.

Reproduce code:
---------------
#! /usr/bin/php
<?php
echo "=== Test #01 Non-UTF-8\n";
$a = str_repeat("a", 49997);
echo "1. \"a\" repeated 49997 times\n";
printf("   strlen(): %6d, preg_match(): %d\n",
    strlen($a), preg_match("/^(.*?)\s*$/us", $a));
$a = str_repeat("a", 49998);
echo "2. \"a\" repeated 49998 times\n";
printf("   strlen(): %6d, preg_match(): %d\n",
    strlen($a), preg_match("/^(.*?)\s*$/us", $a));
echo "=== Test #02 UTF-8\n";
$a = str_repeat("\xE4\xB8\x80", 49997);
echo "1. \"\\xE4\\xB8\\x80\" repeated 49997 times\n";
printf("   strlen(): %6d, preg_match(): %d\n",
    strlen($a), preg_match("/^(.*?)\s*$/us", $a));
$a = str_repeat("\xE4\xB8\x80", 49998);
echo "2. \"\\xE4\\xB8\\x80\" repeated 49998 times\n";
printf("   strlen(): %6d, preg_match(): %d\n",
    strlen($a), preg_match("/^(.*?)\s*$/us", $a));
?>


Expected result:
----------------
=== Test #01 Non-UTF-8
1. "a" repeated 49997 times
   strlen():  49997, preg_match(): 1
2. "a" repeated 49998 times
   strlen():  49998, preg_match(): 0
=== Test #02 UTF-8
1. "\xE4\xB8\x80" repeated 49997 times
   strlen(): 149991, preg_match(): 1
2. "\xE4\xB8\x80" repeated 49998 times
   strlen(): 149994, preg_match(): 0


Actual result:
--------------
=== Test #01 Non-UTF-8
1. "a" repeated 49997 times
   strlen():  49997, preg_match(): 1
2. "a" repeated 49998 times
   strlen():  49998, preg_match(): 1
=== Test #02 UTF-8
1. "\xE4\xB8\x80" repeated 49997 times
   strlen(): 149991, preg_match(): 1
2. "\xE4\xB8\x80" repeated 49998 times
   strlen(): 149994, preg_match(): 1


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2006-12-26 07:24 UTC] imacat at mail dot imacat dot idv dot tw
I was wrong. ^^; sorry.  The Expected Result is:

=== Test #01 Non-UTF-8
1. "a" repeated 49997 times
   strlen():  49997, preg_match(): 1
2. "a" repeated 49998 times
   strlen():  49998, preg_match(): 1
=== Test #02 UTF-8
1. "\xE4\xB8\x80" repeated 49997 times
   strlen(): 149991, preg_match(): 1
2. "\xE4\xB8\x80" repeated 49998 times
   strlen(): 149994, preg_match(): 1

And the Actual Result is:

=== Test #01 Non-UTF-8
1. "a" repeated 49997 times
   strlen():  49997, preg_match(): 1
2. "a" repeated 49998 times
   strlen():  49998, preg_match(): 0
=== Test #02 UTF-8
1. "\xE4\xB8\x80" repeated 49997 times
   strlen(): 149991, preg_match(): 1
2. "\xE4\xB8\x80" repeated 49998 times
   strlen(): 149994, preg_match(): 0
 [2006-12-26 09:48 UTC] tony2001@php.net
http://php.net/pcre

Table 1. PCRE Configuration OptionsName	Default	Changeable	Changelog
pcre.backtrack_limit	100000	PHP_INI_ALL	Available since PHP 5.2.0.
pcre.recursion_limit	100000	PHP_INI_ALL	Available since PHP 5.2.0.
 [2006-12-26 15:06 UTC] imacat at mail dot imacat dot idv dot tw
Well.... This must be some kind of blind spot.  I spent a lot of time finding out the configuration setting, but have never thought of altering it. 

It seems to solve the problem.  I'm terribly sorry for the bothering.
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Tue Jul 01 02:01:36 2025 UTC