php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #39951 PCRE Failed for Long Matches (Less Than MATCH_LIMIT)
Submitted: 2006-12-26 07:20 UTC Modified: 2006-12-26 15:17 UTC
From: imacat at mail dot imacat dot idv dot tw Assigned:
Status: Not a bug Package: PCRE related
PHP Version: 5.2.0 OS: Linux 2.6.16.29
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: imacat at mail dot imacat dot idv dot tw
New email:
PHP Version: OS:

 

 [2006-12-26 07:20 UTC] imacat at mail dot imacat dot idv dot tw
Description:
------------
    Hi.  This is imacat from Taiwan.  I experienced PCRE failure after long matches.  It doesn't seems to pass the 50000 match limit, UTF-8 or not.  However, looking into the included PCRE library directory I saw no such limit anywhere.    In config0.m4 the setting is -DMATCH_LIMIT=10000000.  In pcrelib/README it states that the default of --with-match-limit is 500000.  In php.ini I saw pcre.backtrack_limit=100000.  Whatever I saw are far less than the 50000 match limit.

    This is hard to me since I have several articles to be parsed that's of size over 50000 bytes/characters.

Reproduce code:
---------------
#! /usr/bin/php
<?php
echo "=== Test #01 Non-UTF-8\n";
$a = str_repeat("a", 49997);
echo "1. \"a\" repeated 49997 times\n";
printf("   strlen(): %6d, preg_match(): %d\n",
    strlen($a), preg_match("/^(.*?)\s*$/us", $a));
$a = str_repeat("a", 49998);
echo "2. \"a\" repeated 49998 times\n";
printf("   strlen(): %6d, preg_match(): %d\n",
    strlen($a), preg_match("/^(.*?)\s*$/us", $a));
echo "=== Test #02 UTF-8\n";
$a = str_repeat("\xE4\xB8\x80", 49997);
echo "1. \"\\xE4\\xB8\\x80\" repeated 49997 times\n";
printf("   strlen(): %6d, preg_match(): %d\n",
    strlen($a), preg_match("/^(.*?)\s*$/us", $a));
$a = str_repeat("\xE4\xB8\x80", 49998);
echo "2. \"\\xE4\\xB8\\x80\" repeated 49998 times\n";
printf("   strlen(): %6d, preg_match(): %d\n",
    strlen($a), preg_match("/^(.*?)\s*$/us", $a));
?>


Expected result:
----------------
=== Test #01 Non-UTF-8
1. "a" repeated 49997 times
   strlen():  49997, preg_match(): 1
2. "a" repeated 49998 times
   strlen():  49998, preg_match(): 0
=== Test #02 UTF-8
1. "\xE4\xB8\x80" repeated 49997 times
   strlen(): 149991, preg_match(): 1
2. "\xE4\xB8\x80" repeated 49998 times
   strlen(): 149994, preg_match(): 0


Actual result:
--------------
=== Test #01 Non-UTF-8
1. "a" repeated 49997 times
   strlen():  49997, preg_match(): 1
2. "a" repeated 49998 times
   strlen():  49998, preg_match(): 1
=== Test #02 UTF-8
1. "\xE4\xB8\x80" repeated 49997 times
   strlen(): 149991, preg_match(): 1
2. "\xE4\xB8\x80" repeated 49998 times
   strlen(): 149994, preg_match(): 1


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2006-12-26 07:24 UTC] imacat at mail dot imacat dot idv dot tw
I was wrong. ^^; sorry.  The Expected Result is:

=== Test #01 Non-UTF-8
1. "a" repeated 49997 times
   strlen():  49997, preg_match(): 1
2. "a" repeated 49998 times
   strlen():  49998, preg_match(): 1
=== Test #02 UTF-8
1. "\xE4\xB8\x80" repeated 49997 times
   strlen(): 149991, preg_match(): 1
2. "\xE4\xB8\x80" repeated 49998 times
   strlen(): 149994, preg_match(): 1

And the Actual Result is:

=== Test #01 Non-UTF-8
1. "a" repeated 49997 times
   strlen():  49997, preg_match(): 1
2. "a" repeated 49998 times
   strlen():  49998, preg_match(): 0
=== Test #02 UTF-8
1. "\xE4\xB8\x80" repeated 49997 times
   strlen(): 149991, preg_match(): 1
2. "\xE4\xB8\x80" repeated 49998 times
   strlen(): 149994, preg_match(): 0
 [2006-12-26 09:48 UTC] tony2001@php.net
http://php.net/pcre

Table 1. PCRE Configuration OptionsName	Default	Changeable	Changelog
pcre.backtrack_limit	100000	PHP_INI_ALL	Available since PHP 5.2.0.
pcre.recursion_limit	100000	PHP_INI_ALL	Available since PHP 5.2.0.
 [2006-12-26 15:06 UTC] imacat at mail dot imacat dot idv dot tw
Well.... This must be some kind of blind spot.  I spent a lot of time finding out the configuration setting, but have never thought of altering it. 

It seems to solve the problem.  I'm terribly sorry for the bothering.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sun Jun 16 22:01:30 2024 UTC