php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #49569 preg_replace global replace bug
Submitted: 2009-09-16 10:42 UTC Modified: 2009-09-24 01:00 UTC
From: thespiraloflife at hotmail dot com Assigned:
Status: No Feedback Package: PCRE related
PHP Version: 5.2.10 OS: linux
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: thespiraloflife at hotmail dot com
New email:
PHP Version: OS:

 

 [2009-09-16 10:42 UTC] thespiraloflife at hotmail dot com
Description:
------------
preg_replace has a bug

//CASE STUDY: remove multiple instances of &nbsp; inside <table> <div id="Paginator">&nbsp; <span></span>&nbsp;<a href="#"></a> &nbsp; </div></table> 
$m = "?";//Any marker that isn't currently in the strHTML string. Note, this is an example, but you should use a function that gets a safeMarker.
// replaces all instances of </table> with </table>?
$strHTML = preg_replace('/(<\/table>)/','$1'.$m, $strHTML);
//looks for all &nbsp; within <table><div id="Paginator"> and removes them
//This would be the code to use but php seems to be greedy and doesn't match all instances and only gets the last &nbsp;, where as perl, .NET or javascript regular expressions would
$strHTML = preg_replace('/(\<div\sid\=\"Paginator\"\>[^\?]*)\&nbps\;/','$1', $strHTML);

Reproduce code:
---------------
---
From manual page: function.preg-replace
---

//CASE STUDY: remove multiple instances of &nbsp; inside <table> <div id="Paginator"></div></table> 
$m = "?";//Any marker that isn't currently in the strHTML string. Note, this is an example, but you should use a function that gets a safeMarker.
// replaces all instances of </table> with </table>?
$strHTML = preg_replace('/(<\/table>)/','$1'.$m, $strHTML);
//looks for all &nbsp; within <div id="Paginator"> and removes them
//This would be the code to use but php seems to be greedy and doesn't match all instances and only gets the last &nbsp;, where as perl, .NET or javascript regular expressions would
$strHTML = preg_replace('/(\<div id\=\"Paginator\"\>[^\?]*)\&nbps\;/','$1', $strHTML);

Expected result:
----------------
Would remove all instances of &nbsp; inside <div id="Paginator"

Actual result:
--------------
Only removes the last &nbps; inside <div id="Paginator"

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-09-16 12:02 UTC] jani@php.net
Thank you for this bug report. To properly diagnose the problem, we
need a short but complete example script to be able to reproduce
this bug ourselves. 

A proper reproducing script starts with <?php and ends with ?>,
is max. 10-20 lines long and does not require any external 
resources such as databases, etc. If the script requires a 
database to demonstrate the issue, please make sure it creates 
all necessary tables, stored procedures etc.

Please avoid embedding huge scripts into the report.

You do know about this: 
http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php

And that PCRE != Perl? 
 [2009-09-24 01:00 UTC] php-bugs at lists dot php dot net
No feedback was provided for this bug for over a week, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 21 13:01:31 2024 UTC