php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #51777 RegEx matching fails
Submitted: 2010-05-09 18:36 UTC Modified: 2010-05-18 08:15 UTC
From: trevor at ridgebizdev dot com Assigned:
Status: Not a bug Package: PCRE related
PHP Version: 5.3.2 OS: Windows XP
Private report: No CVE-ID: None
 [2010-05-09 18:36 UTC] trevor at ridgebizdev dot com
Description:
------------
When a RegEx "looks" over ~32768 times during a successful match, every RegEx function fails and returns the empty string.



Test script:
---------------
<?php

	$response = http_get("http://www.travelocity.com");
	// no problem with these first 2 RegExs
	$response2 = preg_replace('/\s+/'," ",$response);
	$mytitle = preg_replace('/.*?<\s*title\s*>([^<]*)<.*/i','${1}',$response2);
	echo "\nTitle Match Forward: ".$mytitle."\n\n";
	// now, here's a problem
	$mytitle2 = preg_replace('/.*<\s*title\s*>([^<]*)<.*/i','${1}',$response2);
	echo "\nTitle Match Backward: ".$mytitle2."\n\n";

?>

Expected result:
----------------
$mytitle gets extracted properly and echoed because the RegEx never looks more than 32768 times starting at the beginning of the travelocity.com page source.  $mytitle2 never gets extracted because the RegEx looks more than 32768 times successfully and preg_replace() crashes into the empty string.  Matching forward for the title is working; matching backward for the title is failing for large buffers.

Actual result:
--------------
Title Match Forward: Travelocity Travel: Airline Tickets, Hotels, Flights, Vacations, Cruises &amp; Car Rentals


Title Match Backward: 

Patches

add-bigger-RegEx-engine (last revision 2010-05-09 16:43 UTC by trevor at ridgebizdev dot com)

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2010-05-09 23:02 UTC] felipe@php.net
-Package: Regexps related +Package: PCRE related
 [2010-05-12 09:14 UTC] mike@php.net
-Status: Open +Status: Feedback
 [2010-05-12 09:14 UTC] mike@php.net
Please try using this snapshot:

  http://snaps.php.net/php5.3-latest.tar.gz
 
For Windows:

  http://windows.php.net/snapshots/

Works here. Do you have a pcre.backtrack_limit set?
 [2010-05-12 20:10 UTC] trevor at ridgebizdev dot com
-Status: Feedback +Status: Open -Type: Bug +Type: Feature/Change Request -Operating System: Windows XP and Linux Server +Operating System: Windows XP
 [2010-05-12 20:10 UTC] trevor at ridgebizdev dot com
[Pcre]
;PCRE library backtracking limit.
; http://php.net/pcre.backtrack-limit
;pcre.backtrack_limit=100000

;PCRE library recursion limit.
;Please note that if you set this value to a high number you may consume all
;the available process stack and eventually crash PHP (due to reaching the
;stack size limit imposed by the Operating System).
; http://php.net/pcre.recursion-limit
;pcre.recursion_limit=100000

I have nothing set (I must be using the default).  I use Windows XP Professional with 4GB RAM and Core2Duo.
 [2010-05-13 04:19 UTC] trevor at ridgebizdev dot com
Setting the pcre.backtrack_limit makes 51777 an installation feature request; however, because no warning is fired, the report remains a bug.
 [2010-05-17 08:22 UTC] mike@php.net
-Status: Open +Status: Bogus
 [2010-05-17 21:24 UTC] trevor at ridgebizdev dot com
preg_last_error() is stellar.  Why is it not used properly by the preg_ functions?
 [2010-05-18 08:15 UTC] trevor at ridgebizdev dot com
preg_last_error() has been available only for months with PHP 5.2.0.  If you notice my style of ${1}, you know I've been using these functions since before 5.2.0.  Who demanded preg_last_error() and when?  Why don't the preg_ functions use preg_last_error()?  preg_last_error() is working, but it's only half of the process of finishing the preg_ functions.  Whether the expression fails to match or if the expression engine crashes (which sounds like failing to match) then the argument string is the return value--not the empty string.  Do you dig it?
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 21 14:01:32 2024 UTC