php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #34530 preg_match result depends on input string length when it should not
Submitted: 2005-09-16 19:27 UTC Modified: 2005-09-17 05:22 UTC
From: php at clayst dot com Assigned:
Status: Not a bug Package: PCRE related
PHP Version: 4.4.0 OS: Windows XP Pro
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: php at clayst dot com
New email:
PHP Version: OS:

 

 [2005-09-16 19:27 UTC] php at clayst dot com
Description:
------------
A regular expression which involves some unnecessary extra loops in execution will give the proper result (a match) with a given input string but will fail to match when it should if the input is one character longer.

In the example, the pattern is intended to match a string which contains:

(1) An optional substring consisting of one or more groups of word characters separated by dashes, with the entire substring (if present) terminated by a period; followed by

(2) A required substring consisting of one or more groups of word characters separated by dashes.

The 'good' pattern shown does this properly.  The 'bad' pattern incorrectly makes the dashes between substrings optional -- i.e. there's an extra '?" after the '-' in both portions of the pattern.

This error makes the pattern inefficient as there are many possible matching substrings, but I believe it should still match a simple alpha string.  In fact it fails to match if the input string is more than 20 characters long.


Reproduce code:
---------------
<?php
$goodpattern = '/^((\w+(-\w+)*)\.)?(\w+(-\w+)*)$/';
$badpattern = '/^((\w+(-?\w+)*)\.)?(\w+(-?\w+)*)$/';
$string1 = str_repeat('a', 20);
$string2 = str_repeat('a', 21);

print('Good pattern, 20 characters: ' . preg_match($goodpattern, $string1) . "\n");
print('Good pattern, 21 characters: ' . preg_match($goodpattern, $string2) . "\n");

print('Bad pattern, 20 characters:  ' . preg_match($badpattern, $string1) . "\n");
print('Bad pattern, 21 characters:  ' . preg_match($badpattern, $string2) . "\n");
?>

Expected result:
----------------
All matches should return a 1 because they all match the given string.

Actual result:
--------------
The first three matches return a 1 but the last returns a 0:

Good pattern, 20 characters: 1
Good pattern, 21 characters: 1
Bad pattern, 20 characters:  1
Bad pattern, 21 characters:  0

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2005-09-16 21:02 UTC] sniper@php.net
Thank you for not searching the bug database before submitting yet another bogus bug report about this issue.
Read about the limitations in PCRE library here:
    
    http://www.pcre.org/pcre.txt

This is not PHP bug, just PCRE library's limitation.
(also mentioned in the manual, if you had bothered reading it)
 [2005-09-17 05:10 UTC] php at clayst dot com
There is no need to be nasty, particularly since you have no idea what I did and did not do and in fact your deductions are wrong on both counts.

I am a developer myself and I know better than to submit a bug report without trying to see if someone else found the same thing first.  You have the advantage of knowing about the prior reports which make make the similarities obvious.  I did not.

I searched the bugs database several times using the advanced search page and several different sets of keywords and other conditions.  I did not find anything that looked like this bug.  I also reviewed each of the reports that the system pulled up on my initial submission as potentially similar bugs and found nothing that matched.

I also read the Pattern Syntax and preg_match pages of the manual, multiple times in fact, and have read them many times before.  I saw several statements that explain about various types of loops, stack space, etc.  Some of them might apply but there is no way to tell which, if any, do apply.

More generally, IMO PHP should not fail silently and relatively randomly (in this case depending on the length of an input to a function) due to some kind of overflow condition.  In the software I write I'd consider that to be a bug, whether it occurred in an external library I chose to use or in my own code.  If a limitation can't be worked around that's completely fine and not unusual, but IMO running into it should produce an error, not silence.  PHP developers might make a different choice and decide that silent failure in this case is within their design parameters but then that should be documented.  I don't see that it is.
 [2005-09-17 05:22 UTC] php at clayst dot com
A specific question -- the PCRE file you referred to documents 4 limitations:

(1) 64KB in a compiled pattern under normal circumstances;
(2) 64K max for repeating quantifiers, 64K max capturing subpatterns;
(3) Max subpattern nesting depth 200; and
(4) Stack space may limit the size of a subject string due to use of recursion for subpatterns and repetition.

As far as I can see only (4) could apply to this case.  Is that what you are referring to?  If so, why is there no crash and no error but merely a silent incorrect return from preg_match?  Does either PCRE or PHP fail to notice a stack overflow?
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Mon Dec 23 00:01:29 2024 UTC