PHP :: Bug #30071 :: Many problems with Perl-Compatible Regular Expressions

Bug #30071	Many problems with Perl-Compatible Regular Expressions
Submitted:	2004-09-13 01:57 UTC	Modified:	2004-09-13 06:37 UTC
From:	wesleywex at gmail dot com	Assigned:
Status:	Not a bug	Package:	PCRE related
PHP Version:	4.3.9RC2	OS:	Windows 2000 SP4
Private report:	No	CVE-ID:	None

View Developer Edit

Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know!
Just going to say 'Me too!'? Don't clutter the database with that please !

Your email address: MUST BE VALID
Solve the problem: 30 - 8 = ?
Subscribe to this entry?

[2004-09-13 01:57 UTC] wesleywex at gmail dot com

Description:
------------
I'm trying to replace some multiple tags like <wsb></wsb> and <wsimg> with only one regex. While I was using few tags, I saw no problem (what indicates that the regex is valid), but when the tags reaches some unknown limit, the system makes very odd things, like:

- Not parsing correctly the text
- Taking much more time to proccess the script if just one small new line is inserted

You can see theese glitches in 3 files I've prepared. All 3 have similar contents, I just changed one line between them.

Reproduce code:
---------------
Variations:
1: http://wstec.net/tmp/php_bug_pcre_02/code_01.txt
2: http://wstec.net/tmp/php_bug_pcre_02/code_02.txt
3: http://wstec.net/tmp/php_bug_pcre_02/code_03.txt

Actual result:
--------------
Attention to the line with the <wsx />

1: http://wstec.net/tmp/php_bug_pcre_02/code_01.php
- Took more than 1 second to parse the text only 2 times
- Parsed everything as it should be parsed
- Note that there aren't spaces in the text among <!-- and -->

2: http://wstec.net/tmp/php_bug_pcre_02/code_02.php
- Took more than 1 second to parse the text only 1 time
- Not everything that should be parsed was
- Note the only difference is the spaces in the text among <!-- and -->

3: http://wstec.net/tmp/php_bug_pcre_02/code_03.php
- Took less than 1 second to parse the text the same 2 times than the firs scipt
- Parsed everything as it should be parsed
- Note that the only difference between the previous scripts is that line starting with <wsx />

I hope you understand the issue now

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports

[2004-09-13 04:29 UTC] rasmus@php.net

It's not that hard to write recursive regular expressions that brings the regex parser to its knees.  As far as I can tell this is what you have done.  But it is hard to tell as you don't even explain what you are expecting from the simple case of something like <wsb>abc</wsb>
Plugging that into your regex gets me:
NEO_HTML( &quot;b&quot;, &quot;&quot;, &quot;abc&quot; )
Is that really what you are trying to get?

If you really want to get to the bottom of it, grab PCRE from www.pcre.org and plug your regular expression into PCRE directly using the provided pcredemo.c.  This way you won't be using any PHP code.  If it works perfectly, come back and say so.  If it still shows problems, file a bug with PCRE.

[2004-09-13 05:39 UTC] wesleywex at gmail dot com

Yes, the real regex is a function (in fact it's $this->output_neohtml( ... )), and that IS what I expect, but this is just what I DON'T get in this example.

I thought PHP would help me solve this bug, but, as you told me, I must report it to the PCRE creators. Can you say where? Because I didn't find anything on the link you sent.

[2004-09-13 06:37 UTC] magnus@php.net

Report PCRE bugs here (I guess): http://sourceforge.net/tracker/?group_id=10194&atid=360194

Since it wasn't a bug in PHP, I'm marking this bogus.

	php.net \| support \| documentation \| report a bug \| advanced search \| search howto \| statistics \| random bug \| login
go to bug id or search bugs for


Copyright © 2001-2025 The PHP Group All rights reserved.	Last updated: Wed Jul 02 15:01:34 2025 UTC