|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
[2006-07-04 00:46 UTC] dave at smartboy dot com
Description:
------------
preg_match() with regexp to test for valid UTF-8 sequence - fails and causes the error message:
'Could not open input file: <name of script>.php'
IF the subject string passed to preg_match() is longer than 1629 characters. (Or in this case the size of the file 'zzz' which contains ASCII)
There was no such limitation in preg_match() in the previous version of PHP (5.1.2)
Reproduce code:
---------------
$str = file_get_contents('zzz');
echo "Loaded file...\n";
$result = preg_match('/^([\x00-\x7f]|[\xc2-\xdf][\x80-\xbf]|\xe0[\xa0-\xbf][\x80-\xbf]|' .
'[\xe1-\xec][\x80-\xbf]{2}|\xed[\x80-\x9f][\x80-\xbf]|[\xee-\xef][\x80-\xbf]{2}|' .
'\xf0[\x90-\xbf][\x80-\xbf]{2}|[\xf1-\xf3][\x80-\xbf]{3}|\xf4[\x80-\x8f][\x80-\xbf]{2})*$/S',
$str) === 1;
echo "Back from preg_match()\n";
var_export($result);
echo "\n";
Expected result:
----------------
Loaded file...
Back from preg_match()
true
Actual result:
--------------
Loaded file...
Could not open input file: u8.php
PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
|
|||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Thu Nov 06 06:00:02 2025 UTC |
It turns out that no test file is needed $str = str_repeat('a', 1575); //works $str = str_repeat('a', 1576); //fails There are really two issues: - PCRE not working for "long" strings (although 1576 bytes is not really a "long" string). This greatly limits the usefulness of regexp pattern matching. - When PCRE fails the error message is VERY misleading. Surely an E_NOTICE should be issued by preg_match() if match has failed due to out of memory, etc.? "Could not open input file" is just plain wrong.$str = str_repeat('a', 1000000); //works in PHP 5.1.2