php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #13439 unbound .* patterns don't work properly
Submitted: 2001-09-25 14:41 UTC Modified: 2004-07-08 17:40 UTC
Votes:2
Avg. Score:2.5 ± 1.5
Reproduced:0 of 1 (0.0%)
From: john at scl dot co dot uk Assigned:
Status: Not a bug Package: Documentation problem
PHP Version: 4.2.1 OS: linux 2.2.19
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: john at scl dot co dot uk
New email:
PHP Version: OS:

 

 [2001-09-25 14:41 UTC] john at scl dot co dot uk
These 2 are OK:

ereg_replace('..','b','aa') -> b
ereg_replace('.+','b','aa') -> b

But this is wrong:

ereg_replace('.*','b','aa') -> bb

You can "fix" it by binding to the beginning:

ereg_replace('^.*','b','aa') -> b

but not by binding to the end:

ereg_replace('.*$','b','aa') -> bb

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2002-06-02 13:58 UTC] derick@php.net
Sorry, but the bug system is not the appropriate forum for asking
support questions. Your problem does not imply a bug in PHP itself.
For a list of more appropriate places to ask for help using PHP,
please visit http://www.php.net/support.php

Thank you for your interest in PHP.


 [2002-06-07 14:48 UTC] john at scl dot co dot uk
Derick!

This _is_ a bug!  I've now upgraded to 4.2.1 and the
behaviour remains the same.  Consider just these two:

ereg_replace('.+','b','aa') -> b
ereg_replace('.*','b','aa') -> bb

and this from man 7 regex:

"An atom followed by `*' matches a sequence
of 0 or more matches of the atom.  An atom followed
by `+' matches a sequence of 1 or more matches of
the atom."

How can it NOT be a bug?
 [2002-06-07 16:48 UTC] sniper@php.net
Updated the version..which OS are you running PHP on?

 [2002-06-08 07:50 UTC] john at scl dot co dot uk
I'm running linux 2.2.19.  I've also tested php 4.0.4pl1
running on linux 2.4.2 and the behaviour is the same.
 [2002-06-08 08:32 UTC] msopacua at idg dot nl
I think this can be documentation issue - or a bug.

Remember that ereg(i)_replace functions are always 'g' modified. So that would make it the issue, whether * is greedy.
By nature + is greedy (cause otherwise the 'or more' wouldn't make sence), but if it's considered, that * isn't greedy, than what php does is perfectly correct as each 'a' matches .*.

It would at least be consistent if they we're both greedy as .*? isn't supported and you can't turn of the global behavior.
 [2002-06-08 13:13 UTC] john at scl dot co dot uk
AFAIK, POSIX re's have no concept of non-greedy quantifiers.
All quantifiers are greedy.  But even supposing that this
were a possibility i.e. the * quntifier is non-greedy, the following needs explaining:

1)  Why the difference between + and * ?

2)  ereg_replace('.*','b','aa') should produce an _infinite_
string of b's because the minimum match is the empty string
and the global ('g' modified) behavior means that there are
an infinity of minimum matches available.

3) ereg_replace('.*c','b','aac') should produce 'ab' but it
doesn't, it produces just 'b'.  In other words, the addition
of the 'c' following the * quantifier has changed the
behaviour from non-greedy to greedy!

There is no consistent explanation available for this
behaviour.  The behaviour is just plain wrong!
 [2002-10-27 22:28 UTC] sterling@php.net
So it goes...  
 [2004-07-08 17:40 UTC] nlopess@php.net
The regex documentation isn't supposed to be an extensive regular expression's manual, as they are complex.

The current bahaviour is correct, so this is only a problem with you regex. (I've confirmed this with the author of the regex Oreily's book).
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Wed Jul 02 07:01:33 2025 UTC