php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #41216 PCRE_UTF8 slows down 30 times
Submitted: 2007-04-27 17:26 UTC Modified: 2007-04-27 17:43 UTC
From: DPP <paul dot dovbush at gmail dot com> Assigned:
Status: Not a bug Package: PCRE related
PHP Version: 5.2.1 OS: WinXPsp2
Private report: No CVE-ID: None
 [2007-04-27 17:26 UTC] DPP <paul dot dovbush at gmail dot com>
Description:
------------
Parsing file with 10000 lines of following format:

level + delim + [@xref_id@ + delim +] tag + [delim + line_value +] terminator

level		digit
delim		space
xref_id	alphanum
tag		alpha (english)
line_value	any (except terminator)
terminator	\r\n

With regexp:

$c=preg_match_all("/^\s*(\d+)\s+(@(\S+)@\s+)?(\w+)(\s+@(\S+)@\s*|.*)?$/Sm",$fp,$m,PREG_PATTERN_ORDER);

Setting PCRE_UTF8 modifier slows whole script down 30 times (from 300ms to 9000ms).

May be more accurate regexp here will be
$c=preg_match_all("/^ *(\d+) +(@([^@\\n]+)@ +)?([^ \\n]+)( +@([^@\\n]+)@ *| +[^\\n]*)?$/m",$fp,$m,PREG_PATTERN_ORDER);
But it changes nothing.


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2007-04-27 17:33 UTC] DPP <paul dot dovbush at gmail dot com>
Forgot to say: file contain russian text encoded in UTF-8.
Without PCRE_UTF8 modifier regexp falls on russian letter "R".
 [2007-04-27 17:43 UTC] tony2001@php.net
Sorry, but your problem does not imply a bug in PHP itself.  For a
list of more appropriate places to ask for help using PHP, please
visit http://www.php.net/support.php as this bug system is not the
appropriate forum for asking support questions.  Due to the volume
of reports we can not explain in detail here why your report is not
a bug.  The support channels will be able to provide an explanation
for you.

Thank you for your interest in PHP.


 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Mar 28 19:01:29 2024 UTC