php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #30071 Many problems with Perl-Compatible Regular Expressions
Submitted: 2004-09-13 01:57 UTC Modified: 2004-09-13 06:37 UTC
From: wesleywex at gmail dot com Assigned:
Status: Not a bug Package: PCRE related
PHP Version: 4.3.9RC2 OS: Windows 2000 SP4
Private report: No CVE-ID: None
View Add Comment Developer Edit
Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know!
Just going to say 'Me too!'? Don't clutter the database with that please !
Your email address:
MUST BE VALID
Solve the problem:
13 + 6 = ?
Subscribe to this entry?

 
 [2004-09-13 01:57 UTC] wesleywex at gmail dot com
Description:
------------
I'm trying to replace some multiple tags like <wsb></wsb> and <wsimg> with only one regex. While I was using few tags, I saw no problem (what indicates that the regex is valid), but when the tags reaches some unknown limit, the system makes very odd things, like:

- Not parsing correctly the text
- Taking much more time to proccess the script if just one small new line is inserted

You can see theese glitches in 3 files I've prepared. All 3 have similar contents, I just changed one line between them.

Reproduce code:
---------------
Variations:
1: http://wstec.net/tmp/php_bug_pcre_02/code_01.txt
2: http://wstec.net/tmp/php_bug_pcre_02/code_02.txt
3: http://wstec.net/tmp/php_bug_pcre_02/code_03.txt

Actual result:
--------------
Attention to the line with the <wsx />

1: http://wstec.net/tmp/php_bug_pcre_02/code_01.php
- Took more than 1 second to parse the text only 2 times
- Parsed everything as it should be parsed
- Note that there aren't spaces in the text among <!-- and -->

2: http://wstec.net/tmp/php_bug_pcre_02/code_02.php
- Took more than 1 second to parse the text only 1 time
- Not everything that should be parsed was
- Note the only difference is the spaces in the text among <!-- and -->

3: http://wstec.net/tmp/php_bug_pcre_02/code_03.php
- Took less than 1 second to parse the text the same 2 times than the firs scipt
- Parsed everything as it should be parsed
- Note that the only difference between the previous scripts is that line starting with <wsx />

I hope you understand the issue now

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2004-09-13 04:29 UTC] rasmus@php.net
It's not that hard to write recursive regular expressions that brings the regex parser to its knees.  As far as I can tell this is what you have done.  But it is hard to tell as you don't even explain what you are expecting from the simple case of something like <wsb>abc</wsb>
Plugging that into your regex gets me:
NEO_HTML( &quot;b&quot;, &quot;&quot;, &quot;abc&quot; )
Is that really what you are trying to get?

If you really want to get to the bottom of it, grab PCRE from www.pcre.org and plug your regular expression into PCRE directly using the provided pcredemo.c.  This way you won't be using any PHP code.  If it works perfectly, come back and say so.  If it still shows problems, file a bug with PCRE.
 [2004-09-13 05:39 UTC] wesleywex at gmail dot com
Yes, the real regex is a function (in fact it's $this->output_neohtml( ... )), and that IS what I expect, but this is just what I DON'T get in this example.

I thought PHP would help me solve this bug, but, as you told me, I must report it to the PCRE creators. Can you say where? Because I didn't find anything on the link you sent.
 [2004-09-13 06:37 UTC] magnus@php.net
Report PCRE bugs here (I guess): http://sourceforge.net/tracker/?group_id=10194&atid=360194

Since it wasn't a bug in PHP, I'm marking this bogus.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Apr 23 11:01:33 2024 UTC