php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #13439 unbound .* patterns don't work properly
Submitted: 2001-09-25 14:41 UTC Modified: 2004-07-08 17:40 UTC
Votes:2
Avg. Score:2.5 ± 1.5
Reproduced:0 of 1 (0.0%)
From: john at scl dot co dot uk Assigned:
Status: Not a bug Package: Documentation problem
PHP Version: 4.2.1 OS: linux 2.2.19
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: john at scl dot co dot uk
New email:
PHP Version: OS:

 

 [2001-09-25 14:41 UTC] john at scl dot co dot uk
These 2 are OK:

ereg_replace('..','b','aa') -> b
ereg_replace('.+','b','aa') -> b

But this is wrong:

ereg_replace('.*','b','aa') -> bb

You can "fix" it by binding to the beginning:

ereg_replace('^.*','b','aa') -> b

but not by binding to the end:

ereg_replace('.*$','b','aa') -> bb

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2002-06-02 13:58 UTC] derick@php.net
Sorry, but the bug system is not the appropriate forum for asking
support questions. Your problem does not imply a bug in PHP itself.
For a list of more appropriate places to ask for help using PHP,
please visit http://www.php.net/support.php

Thank you for your interest in PHP.


 [2002-06-07 14:48 UTC] john at scl dot co dot uk
Derick!

This _is_ a bug!  I've now upgraded to 4.2.1 and the
behaviour remains the same.  Consider just these two:

ereg_replace('.+','b','aa') -> b
ereg_replace('.*','b','aa') -> bb

and this from man 7 regex:

"An atom followed by `*' matches a sequence
of 0 or more matches of the atom.  An atom followed
by `+' matches a sequence of 1 or more matches of
the atom."

How can it NOT be a bug?
 [2002-06-07 16:48 UTC] sniper@php.net
Updated the version..which OS are you running PHP on?

 [2002-06-08 07:50 UTC] john at scl dot co dot uk
I'm running linux 2.2.19.  I've also tested php 4.0.4pl1
running on linux 2.4.2 and the behaviour is the same.
 [2002-06-08 08:32 UTC] msopacua at idg dot nl
I think this can be documentation issue - or a bug.

Remember that ereg(i)_replace functions are always 'g' modified. So that would make it the issue, whether * is greedy.
By nature + is greedy (cause otherwise the 'or more' wouldn't make sence), but if it's considered, that * isn't greedy, than what php does is perfectly correct as each 'a' matches .*.

It would at least be consistent if they we're both greedy as .*? isn't supported and you can't turn of the global behavior.
 [2002-06-08 13:13 UTC] john at scl dot co dot uk
AFAIK, POSIX re's have no concept of non-greedy quantifiers.
All quantifiers are greedy.  But even supposing that this
were a possibility i.e. the * quntifier is non-greedy, the following needs explaining:

1)  Why the difference between + and * ?

2)  ereg_replace('.*','b','aa') should produce an _infinite_
string of b's because the minimum match is the empty string
and the global ('g' modified) behavior means that there are
an infinity of minimum matches available.

3) ereg_replace('.*c','b','aac') should produce 'ab' but it
doesn't, it produces just 'b'.  In other words, the addition
of the 'c' following the * quantifier has changed the
behaviour from non-greedy to greedy!

There is no consistent explanation available for this
behaviour.  The behaviour is just plain wrong!
 [2002-10-27 22:28 UTC] sterling@php.net
So it goes...  
 [2004-07-08 17:40 UTC] nlopess@php.net
The regex documentation isn't supposed to be an extensive regular expression's manual, as they are complex.

The current bahaviour is correct, so this is only a problem with you regex. (I've confirmed this with the author of the regex Oreily's book).
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Mar 29 10:01:28 2024 UTC