php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #52118 preg_replace gives bad output if matching group equals whole string
Submitted: 2010-06-18 11:25 UTC Modified: 2010-06-18 12:04 UTC
From: tomasz dot slominski at gmail dot com Assigned:
Status: Not a bug Package: *Regular Expressions
PHP Version: Irrelevant OS: WIN XP SP3
Private report: No CVE-ID: None
 [2010-06-18 11:25 UTC] tomasz dot slominski at gmail dot com
Description:
------------
preg replace is going mad when matching group equals to (.*). It seems that 
substitution is made 2 times instead of 1.

Test script:
---------------
var_dump(preg_replace(array("/(.*)/"), array('!$1'),'test'));
var_dump(preg_replace(array("/(.*)/"), array('$1!'),'test'));
var_dump(preg_replace(array("/(.*)/"), array('!$1!'),'test'));

Expected result:
----------------
string '!test' (length=5)
string 'test!' (length=5)
string '!test!' (length=6)


Actual result:
--------------
string '!test!' (length=6)
string 'test!!' (length=6)
string '!test!!!' (length=8)

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2010-06-18 11:40 UTC] tomasz dot slominski at gmail dot com
Fast hack:   var_dump(preg_replace(array("/(.+)(.*)/"), array('!$1$2'),'test')); 
gives good output (!test)
 [2010-06-18 12:04 UTC] salathe@php.net
-Status: Open +Status: Bogus
 [2010-06-18 12:04 UTC] salathe@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

This is expected behaviour. When a match is found, PCRE checks for the next 
possible match starting at the point immediately after the previous match.  In 
your case, the .* first matches the entire subject string "test", then also 
matches again at very end of the string since the * quantifier allows matching 
nothing.
 [2010-06-18 13:36 UTC] tomasz dot slominski at gmail dot com
ok, but shouldn't greedy .* consume the whole string? 

 var_dump(preg_replace("/(.*)/U", '$1!','test'));
 gives
 string '!t!!e!!s!!t!!' (length=13)

and that's ok, but why 

  var_dump(preg_replace("/(.*)/", '$1!','test'));

is producing 

 string 'test!!' (matching 'test' - nothing)

instead of 

 string '!test!!'  (matching nothing - 'test' - nothing)  or string 'test!'   
(matching 'test')

it's at least counter-intuitive
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Dec 03 11:01:29 2024 UTC