php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #3850 ereg_replace broken when pattern string is too long
Submitted: 2000-03-16 16:24 UTC Modified: 2000-03-16 17:30 UTC
From: brian at phorum dot org Assigned:
Status: Closed Package: Misbehaving function
PHP Version: 4.0 Beta 2 OS: Solaris and Windows
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: brian at phorum dot org
New email:
PHP Version: OS:

 

 [2000-03-16 16:24 UTC] brian at phorum dot org
This script:
------------------------------
$body='<table>';
$body=ereg_replace("<(/*[b|u|i|font|ol|ul|li|img|a] *[^>]*)>", "[\\1]", $body);
echo $body;

produces:
------------------------------
[table]

That is wrong.

Shorten the pattern:
------------------------------
$body='<table>';
$body=ereg_replace("<(/*[b|u|i] *[^>]*)>", "[\\1]", $body);
echo $body;

That produces:
------------------------------
<table>

AS it should.  This appears to be a problem in 3.x as well.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2000-03-16 17:30 UTC] torben at cvs dot php dot net
(note: sorry about doubling this up on php-dev: brainfart on my part)

No, that is correct. I think you're misinterpreting what the
[b|u|i|font|ol|ul|li|img|a] is doing. It means 'match any of the
characters b, u, i, f, o, n, t, m, g, a, or |', when what I think you
want is (b|u|i|font|ol|ul|li|img|a), which means 'match any of the
strings b, u, i, font, ol, ul, li, img, or a'.

You are asking the regexp to match:

 o An opening angle bracket, followed by
 o An atom consisting of:
   o zero or more slashes, followed by
   o one of the characters 'buifontmga|', followed by
   o zero or more spaces, followed by
   o zero or more of anything which isn't a closing angle bracket
 o ...and a closing angle bracket.

> Shorten the pattern:
> ------------------------------
> $body='<table>';
> $body=ereg_replace("<(/*[b|u|i] *[^>]*)>", "[\\1]", $body);
> echo $body;
> 
> That produces:
> ------------------------------
> <table>
> 
> AS it should.  This appears to be a problem in 3.x as well.

The reason the second one works is not the length of the regexp, but
rather the fact that the [b|u|i] won't match the first character of
'table', whereas [b|u|i|font|ol|ul|li|img|a] will.

Here's a modified version and a quick test situation which produces
what I think you're trying to achieve;

---8<---test.php---8<---
#!/usr/local/bin/php -q

<?php
$filename = 'http://www.thebuttlesschaps.com/index.html';
$infile = file( $filename )
   or die( "Could not open file '$filename' for reading.\n" );

$infile = join( '', $infile );

$newfile = ereg_replace( '<(/?(b|u|i|font|ol|ul|li|img|a)( [^>]*)*)>', 
                         '[\1]', $infile );

echo $newfile;
?>
---8<---test.php---8<---

An example snippet produced by this script: 

. . .

    <table width="100%" border="0" cellpadding="3">
      <tr>
        <td>
          [font face="Arial, Helvetica" size="-1"]          
          [ul]
            [li]
              [a href="mailto:dave@thebuttlesschaps.com"]Dave
              Gowans[/a] - vocals, acoustic guitar, banjo, air raid
              siren, recorder, Casio&nbsp;
            [/li]

. . .


 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Sun Dec 14 10:00:01 2025 UTC