php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #3850 ereg_replace broken when pattern string is too long
Submitted: 2000-03-16 16:24 UTC Modified: 2000-03-16 17:30 UTC
From: brian at phorum dot org Assigned:
Status: Closed Package: Misbehaving function
PHP Version: 4.0 Beta 2 OS: Solaris and Windows
Private report: No CVE-ID: None
 [2000-03-16 16:24 UTC] brian at phorum dot org
This script:
------------------------------
$body='<table>';
$body=ereg_replace("<(/*[b|u|i|font|ol|ul|li|img|a] *[^>]*)>", "[\\1]", $body);
echo $body;

produces:
------------------------------
[table]

That is wrong.

Shorten the pattern:
------------------------------
$body='<table>';
$body=ereg_replace("<(/*[b|u|i] *[^>]*)>", "[\\1]", $body);
echo $body;

That produces:
------------------------------
<table>

AS it should.  This appears to be a problem in 3.x as well.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2000-03-16 17:30 UTC] torben at cvs dot php dot net
(note: sorry about doubling this up on php-dev: brainfart on my part)

No, that is correct. I think you're misinterpreting what the
[b|u|i|font|ol|ul|li|img|a] is doing. It means 'match any of the
characters b, u, i, f, o, n, t, m, g, a, or |', when what I think you
want is (b|u|i|font|ol|ul|li|img|a), which means 'match any of the
strings b, u, i, font, ol, ul, li, img, or a'.

You are asking the regexp to match:

 o An opening angle bracket, followed by
 o An atom consisting of:
   o zero or more slashes, followed by
   o one of the characters 'buifontmga|', followed by
   o zero or more spaces, followed by
   o zero or more of anything which isn't a closing angle bracket
 o ...and a closing angle bracket.

> Shorten the pattern:
> ------------------------------
> $body='<table>';
> $body=ereg_replace("<(/*[b|u|i] *[^>]*)>", "[\\1]", $body);
> echo $body;
> 
> That produces:
> ------------------------------
> <table>
> 
> AS it should.  This appears to be a problem in 3.x as well.

The reason the second one works is not the length of the regexp, but
rather the fact that the [b|u|i] won't match the first character of
'table', whereas [b|u|i|font|ol|ul|li|img|a] will.

Here's a modified version and a quick test situation which produces
what I think you're trying to achieve;

---8<---test.php---8<---
#!/usr/local/bin/php -q

<?php
$filename = 'http://www.thebuttlesschaps.com/index.html';
$infile = file( $filename )
   or die( "Could not open file '$filename' for reading.\n" );

$infile = join( '', $infile );

$newfile = ereg_replace( '<(/?(b|u|i|font|ol|ul|li|img|a)( [^>]*)*)>', 
                         '[\1]', $infile );

echo $newfile;
?>
---8<---test.php---8<---

An example snippet produced by this script: 

. . .

    <table width="100%" border="0" cellpadding="3">
      <tr>
        <td>
          [font face="Arial, Helvetica" size="-1"]          
          [ul]
            [li]
              [a href="mailto:dave@thebuttlesschaps.com"]Dave
              Gowans[/a] - vocals, acoustic guitar, banjo, air raid
              siren, recorder, Casio&nbsp;
            [/li]

. . .


 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Wed Jun 25 10:01:34 2025 UTC