php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #23946 [ep]reg_replace "z?$" with "a" in "xyz" yields "xyaa" instead of "xya"
Submitted: 2003-06-01 22:38 UTC Modified: 2003-06-18 09:00 UTC
Votes:3
Avg. Score:3.7 ± 0.9
Reproduced:2 of 2 (100.0%)
Same Version:1 (50.0%)
Same OS:2 (100.0%)
From: stuge-phpbugs at cdy dot org Assigned: andrei (profile)
Status: Not a bug Package: *Regular Expressions
PHP Version: 4.3.2 OS: Linux
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: stuge-phpbugs at cdy dot org
New email:
PHP Version: OS:

 

 [2003-06-01 22:38 UTC] stuge-phpbugs at cdy dot org
Verified with 4.3.0 and 4.3.2rc2 but haven't tried 4.3.2. NEWS for 4.3.2 do not mention any *reg_replace changes. Reproducible with eregi_replace().

<?php
  $f="xyz";
  $g=ereg_replace("z?$","a",$f);
  $h=preg_replace("/z?$/","a",$f);
  print $f; /* xyz */
  print " ".$g; /* xyaa, I want xya */
  print " ".$h; /* xyaa, I want xya */
?>

At least in CVS php4/ext/standard/reg.c I believe the problem is that regexec() on line 306 will be called a second time after the first iteration has already matched to the end of the string.

Matching from the end of the string should only be allowed on the very first iteration.

If this analysis is correct, the following patch might work, but I haven't tried it.

--- reg.c.org   2003-06-02 05:23:51.000000000 +0200
+++ reg.c       2003-06-02 05:26:06.000000000 +0200
@@ -302,7 +302,7 @@
 
        err = pos = 0;
        buf[0] = '\0';
-       while (!err) {
+       do {
                err = regexec(&re, &string[pos], re.re_nsub+1, subs, (pos ? REG_NOTBOL : 0));
 
                if (err && err != REG_NOMATCH) {
@@ -396,7 +396,7 @@
                        /* stick that last bit of string on our output */
                        strcat(buf, &string[pos]);
                }
-       }
+       } while(!err && string[pos]);
 
        /* don't want to leak memory .. */
        efree(subs);

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2003-06-03 12:59 UTC] elmicha@php.net
Verified with 4.3.2 (and 4.3.1 and 4.2.3). Similar to #23903, but here I see why you want to use [ep]reg_replace.
 [2003-06-06 20:34 UTC] stuge-phpbugs at cdy dot org
$ echo xyz|perl -e 'while(<STDIN>) { $_=~ s/z?$/a/;print}'
xya
$ echo xyz|perl -e 'while(<STDIN>) { $_=~ s/z?$/a/g;print}'
xyaa
a$ echo xyz|sed -e 's/z\?$/a/g'
xya
$ echo xyz|sed -e 's/z\?$/a/'
xya

perl and sed both behave the way I would expect.
[ep]reg_replace() don't. :)

A comment from php.net/manual/en/pcre.pattern.modifiers.php:
/g is default in PHP. You don't need to set it.

preg_replace() similarity with perl may not be desired, I only included it because it showed the exact same behaviour while using a different regex code base, indicating the same algorithm.

I believe Andrei's comment in 23903 is a bit quick, at least if this is a dupe, which I would say it is.

Again, I believe the error to be that the regexp is "ran" one extra time after all of the input string has already been matched and processed inside [ep]reg_replace().
Because of this, any regexp that matches the empty string will cause one extra replacement to occur at the end of the input string.

At least inside ereg_replace() the loop should just quit if the entire string has been processed after the first iteration.

I'll try my patch and recomment.
 [2003-06-06 21:09 UTC] stuge-phpbugs at cdy dot org
Ok. This patch works in that it gives me the expected results. I modified it slightly to care less about bytes in the string and more about the length variables, that way it's at least widechar-prepared. Woo-hoo! :)

Currently running make test.. All reg tests PASS, I've also added a test for this bug.

Here's the patch against current CVS:
---8<---
diff -ur php4.cvs/ext/pcre/php_pcre.c php4.new/ext/pcre/php_pcre.c
--- php4.cvs/ext/pcre/php_pcre.c        2003-06-07 03:51:13.000000000 +0200
+++ php4.new/ext/pcre/php_pcre.c        2003-06-07 03:52:02.000000000 +0200
@@ -797,7 +797,7 @@
        *result_len = 0;
        start_offset = 0;
 
-       while (1) {
+       do {
                /* Execute the regular expression. */
                count = pcre_exec(re, extra, subject, subject_len, start_offset,
                                                  exoptions|g_notempty, offsets,
 size_offsets);
@@ -933,7 +933,7 @@
 
                /* Advance to the next piece. */
                start_offset = offsets[1];
-       }
+       } while(start_offset < subject_len);
 
        efree(offsets);
 
diff -ur php4.cvs/ext/standard/reg.c php4.new/ext/standard/reg.c
--- php4.cvs/ext/standard/reg.c 2003-06-07 03:49:19.000000000 +0200
+++ php4.new/ext/standard/reg.c 2003-06-07 03:50:43.000000000 +0200
@@ -302,7 +302,7 @@
 
        err = pos = 0;
        buf[0] = '\0';
-       while (!err) {
+       do {
                err = regexec(&re, &string[pos], re.re_nsub+1, subs, (pos ? REG_
NOTBOL : 0));
 
                if (err && err != REG_NOMATCH) {
@@ -396,7 +396,7 @@
                        /* stick that last bit of string on our output */
                        strcat(buf, &string[pos]);
                }
-       }
+       } while(!err && pos < string_len);
 
        /* don't want to leak memory .. */
        efree(subs);
--->8---

And ext/standard/tests/reg/017.phpt:
---8<---
--TEST--
empty expression replaced extra time at end-of-string
--POST--
--GET--
--FILE--
<?php echo ereg_replace('.?$',"a","xyz")?>
--EXPECT--
xya
--->8---

I make no warranties whatsoever that this will not break things, although I don't think it should. :)
 [2003-06-07 04:45 UTC] sniper@php.net
Andrei, any opinions about the patches..?

 [2003-06-18 09:00 UTC] andrei@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

The creator of PCRE and I agree: if this is what Perl does, then PHP should do the same as to avoid confusion, because the current PCRE support has been predicated on Perl compatibility.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Apr 19 13:01:30 2024 UTC