php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #49550 PCRE Recursive Pattern Doesn't Work
Submitted: 2009-09-14 08:16 UTC Modified: 2009-09-14 14:56 UTC
From: spoon dot reloaded at gmail dot com Assigned:
Status: Not a bug Package: PCRE related
PHP Version: 5.3.0 OS: Linux
Private report: No CVE-ID: None
 [2009-09-14 08:16 UTC] spoon dot reloaded at gmail dot com
Description:
------------
I have a recursive PCRE pattern that should match any string (it does in Perl), but it fails to match a very simple string in PHP. I have reproduced this on both 5.2 and 5.3.

Reproduce code:
---------------
echo preg_match('/^(|.(?1))$/', 'ab'), "\n";


Expected result:
----------------
I expect it to print "1", because the pattern matches. In Perl, 'ab' =~ /^(|.(?1))$/ indeed matches. In fact, this pattern should match any string, because it matches either an empty string, or any character followed by something that matches the main part of the pattern itself (which matches any string).

Actual result:
--------------
It prints "0", indicating that it did not match.

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-09-14 09:51 UTC] jani@php.net
Remove the $ and it works. And remember that PCRE != Perl.
 [2009-09-14 10:06 UTC] spoon dot reloaded at gmail dot com
> Remove the $ and it works.
Yeah but that changes the meaning. I want to enforce that it matches the entire string, and it doesn't do that.

To give another example without a $, consider:

echo preg_match('/^(|.(?1))x/', 'abx'), "\n";

Again, it works in Perl, but not in PHP.
 [2009-09-14 10:24 UTC] spoon dot reloaded at gmail dot com
Nevermind, so I think the reason is that PCRE differs from Perl in that a recursive "sub-"pattern (i.e. ?1, ?2, etc. but not ?0) only matches exactly what was matched before by that subgroup, even if there are other unused possibilities by that group. Is that correct?
 [2009-09-14 10:49 UTC] spoon dot reloaded at gmail dot com
But that still wouldn't explain why stuff like balanced parentheses matching /^(\(((?>[^()]+)|(?1))*\))$/ works. I am still waiting for an explanation on my example.
 [2009-09-14 10:58 UTC] spoon dot reloaded at gmail dot com
I found a solution:

echo preg_match('/^|(.(?:|(?1)))$/', 'ab'), "\n";

but I still don't understand why the other one doesn't work
 [2009-09-14 11:05 UTC] spoon dot reloaded at gmail dot com
oops in that last one I meant

echo preg_match('/^$|^(.(?:|(?1)))$/', 'ab'), "\n";
 [2009-09-14 11:14 UTC] spoon dot reloaded at gmail dot com
oops my previous workaround still doens't work

echo preg_match('/^$|^(.(?:|(?1)))$/', 'abc'), "\n";
 [2009-09-14 14:56 UTC] rasmus@php.net
The only time a PCRE pattern related report like this should be submitted as a PHP bug is if you can prove it works in the command line pcre debugging tool, but it breaks in PHP.  Otherwise you should be reporting stuff like this to the PCRE project.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sun May 19 15:01:31 2024 UTC