php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #38000 preg_match fails if strlen >= 1630
Submitted: 2006-07-04 00:46 UTC Modified: 2006-07-06 22:32 UTC
From: dave at smartboy dot com Assigned:
Status: Not a bug Package: PCRE related
PHP Version: 5.1.4 OS: Windows XP
Private report: No CVE-ID: None
 [2006-07-04 00:46 UTC] dave at smartboy dot com
Description:
------------
preg_match() with regexp to test for valid UTF-8 sequence - fails and causes the error message:

'Could not open input file: <name of script>.php'

IF the subject string passed to preg_match() is longer than 1629 characters.  (Or in this case the size of the file 'zzz' which contains ASCII)

There was no such limitation in preg_match() in the previous version of PHP (5.1.2)



Reproduce code:
---------------
$str = file_get_contents('zzz');
echo "Loaded file...\n";
$result = preg_match('/^([\x00-\x7f]|[\xc2-\xdf][\x80-\xbf]|\xe0[\xa0-\xbf][\x80-\xbf]|' .
		'[\xe1-\xec][\x80-\xbf]{2}|\xed[\x80-\x9f][\x80-\xbf]|[\xee-\xef][\x80-\xbf]{2}|' .
		'\xf0[\x90-\xbf][\x80-\xbf]{2}|[\xf1-\xf3][\x80-\xbf]{3}|\xf4[\x80-\x8f][\x80-\xbf]{2})*$/S',
		$str) === 1;

echo "Back from preg_match()\n";
var_export($result);
echo "\n";


Expected result:
----------------
Loaded file...
Back from preg_match()
true


Actual result:
--------------
Loaded file...
Could not open input file: u8.php


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2006-07-04 14:40 UTC] nlopess@php.net
please provide the zzz file. (either post a link to it or send it to my e-mail).
 [2006-07-04 21:23 UTC] dave at smartboy dot com
It turns out that no test file is needed

$str = str_repeat('a', 1575);  //works

$str = str_repeat('a', 1576);  //fails

There are really two issues:
- PCRE not working for "long" strings (although 1576 bytes is not really a "long" string).  This greatly limits the usefulness of regexp pattern matching.

- When PCRE fails the error message is VERY misleading.  Surely an E_NOTICE should be issued by preg_match() if match has failed due to out of memory, etc.?  "Could not open input file" is just plain wrong.
 [2006-07-04 21:33 UTC] dave at smartboy dot com
$str = str_repeat('a', 1000000);  //works in PHP 5.1.2
 [2006-07-06 22:32 UTC] nlopess@php.net
Well what I can reproduce is a pcre backtracking and recursion limit. This is because your regex is particularly difficult to run (you may note that it will need to backtrack a lot!).
The other problem is that the bundled pcre library was upgraded in PHP 5.1.3. And that version has some 'S'tudy features disabled, and you are beeing hited by that. PCRE was previously optimizing your complex regex but now it isn't. We'll have to wait for the next release of PCRE.

In the mean time, you may want to upgrade to php 5.2, where you can tweak the backtrack and recursion limits on run-time (anyway you can always tweak them on compile time on any version).
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Apr 19 08:01:28 2024 UTC