php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #49569 preg_replace global replace bug
Submitted: 2009-09-16 10:42 UTC Modified: 2009-09-24 01:00 UTC
From: thespiraloflife at hotmail dot com Assigned:
Status: No Feedback Package: PCRE related
PHP Version: 5.2.10 OS: linux
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: thespiraloflife at hotmail dot com
New email:
PHP Version: OS:

 

 [2009-09-16 10:42 UTC] thespiraloflife at hotmail dot com
Description:
------------
preg_replace has a bug

//CASE STUDY: remove multiple instances of &nbsp; inside <table> <div id="Paginator">&nbsp; <span></span>&nbsp;<a href="#"></a> &nbsp; </div></table> 
$m = "?";//Any marker that isn't currently in the strHTML string. Note, this is an example, but you should use a function that gets a safeMarker.
// replaces all instances of </table> with </table>?
$strHTML = preg_replace('/(<\/table>)/','$1'.$m, $strHTML);
//looks for all &nbsp; within <table><div id="Paginator"> and removes them
//This would be the code to use but php seems to be greedy and doesn't match all instances and only gets the last &nbsp;, where as perl, .NET or javascript regular expressions would
$strHTML = preg_replace('/(\<div\sid\=\"Paginator\"\>[^\?]*)\&nbps\;/','$1', $strHTML);

Reproduce code:
---------------
---
From manual page: function.preg-replace
---

//CASE STUDY: remove multiple instances of &nbsp; inside <table> <div id="Paginator"></div></table> 
$m = "?";//Any marker that isn't currently in the strHTML string. Note, this is an example, but you should use a function that gets a safeMarker.
// replaces all instances of </table> with </table>?
$strHTML = preg_replace('/(<\/table>)/','$1'.$m, $strHTML);
//looks for all &nbsp; within <div id="Paginator"> and removes them
//This would be the code to use but php seems to be greedy and doesn't match all instances and only gets the last &nbsp;, where as perl, .NET or javascript regular expressions would
$strHTML = preg_replace('/(\<div id\=\"Paginator\"\>[^\?]*)\&nbps\;/','$1', $strHTML);

Expected result:
----------------
Would remove all instances of &nbsp; inside <div id="Paginator"

Actual result:
--------------
Only removes the last &nbps; inside <div id="Paginator"

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-09-16 12:02 UTC] jani@php.net
Thank you for this bug report. To properly diagnose the problem, we
need a short but complete example script to be able to reproduce
this bug ourselves. 

A proper reproducing script starts with <?php and ends with ?>,
is max. 10-20 lines long and does not require any external 
resources such as databases, etc. If the script requires a 
database to demonstrate the issue, please make sure it creates 
all necessary tables, stored procedures etc.

Please avoid embedding huge scripts into the report.

You do know about this: 
http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php

And that PCRE != Perl? 
 [2009-09-24 01:00 UTC] php-bugs at lists dot php dot net
No feedback was provided for this bug for over a week, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Apr 19 12:01:27 2024 UTC