php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #30698 preg_replace with /e does not escape single quotes as per documentation
Submitted: 2004-11-06 08:17 UTC Modified: 2005-04-14 21:08 UTC
From: php at richardneill dot org Assigned:
Status: Not a bug Package: Documentation problem
PHP Version: 4.3.9 OS: Linux
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: php at richardneill dot org
New email:
PHP Version: OS:

 

 [2004-11-06 08:17 UTC] php at richardneill dot org
Description:
------------
The documentation for preg_replace states that /e  
will cause it to add extra slashes to single and double 
quotes. 
 
This means that, if one has magic_quotes on, one must 
filter out the spurious new backslashes, using something 
like:  
$block=str_replace(array('\\\\\'','\\\\"'),array('\\\'','\\"'),
$block);  
 
However, in fact, it appears that preg_replace is adding 
the backslashes to double quotes, but NOT to single 
quotes. 
 
There's also a useful comment here: 
http://uk2.php.net/manual/en/function.preg-replace.php 
steven -a-t- acko dot net  08-Feb-2004 05:45  

Reproduce code:
---------------
FAILS:

$message=preg_replace("/((?<=(\n))|(?<=^))( *>(.*))(\n\n|$)/seU",  "'$quote_font_start BEGIN'.fixblock('\\3',$quote).' END $quote_font_end\n\n'",  $message);

WORKS:
			$message=preg_replace("/((?<=(\n))|(?<=^))( *>(.*))(\n\n|$)/seU",  "'$quote_font_start BEGIN'.fixblock(\"\\3\",$quote).' END $quote_font_end\n\n'",  $message);

Expected result:
----------------
I'm not sure whether this is simply a documentation bug, 
but it's very weird behaviour! It is also a nasty one, 
because it can leave the database vulnerable. Thanks for 
your help. 


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2004-11-10 18:24 UTC] tony2001@php.net
Thank you for this bug report. To properly diagnose the problem, we
need a short but complete example script to be able to reproduce
this bug ourselves. 

A proper reproducing script starts with <?php and ends with ?>,
is max. 10-20 lines long and does not require any external 
resources such as databases, etc.

If possible, make the script source available online and provide
an URL to it here. Try avoid embedding huge scripts into the report.

Btw, the example from docs works just fine for me.
 [2004-11-11 01:44 UTC] php at richardneill dot org
I hope this test case is more useful. 

<?
$message="This string contains a single quote ', a double quote \", and a backslash\ .";

function modify($string){
	return "START $string END";
}

$new_bad=preg_replace("/^(.*)$/e", "'BEGINNING '.modify('\\1').' FINISH'",  $message);
$new_good=preg_replace("/^(.*)$/e", "'BEGINNING '.modify(\"\\1\").' FINISH'",  $message);

echo "MESSAGE:\n$message\n\n";
echo "NEW_BAD:\n$new_bad\n\n";
echo "NEW_GOOD:\n$new_good\n\n";
echo "NB: none of these are the same!\n"


/*
The problem is that depending on precisely how the replacements in the preg_replace are quoted,
the escaping is different. This is not documented. $new_bad and $new_good should be the same!

Furthermore, in the case where $message is user-submitted, and has arrived via magic_quotes, ready for
insertion into a database, modify() will want to include a strip_slashes, to remove the redundant slashes (but leaving the original ones). The above behaviour could leave a database vulnerable to sql injection attacks

My personal favoured solution would be for the /e modifier not to add the extra backslashes.
*/
?>
 [2005-04-05 15:46 UTC] vrana@php.net
It works as expected and documented:

modify('\\1') is translated to modify('single quote \', a double quote \", and a backslash\\') resulting in

single quote ', a double quote \", and a backslash\

---

modify("\\1") is translated to modify("single quote \', a double quote \", and a backslash\\") resulting in

single quote \', a double quote ", and a backslash\


 [2005-04-05 16:47 UTC] php at richardneill dot org
I don't think this is exactly bogus, since I think the documentation is not clear. The documentation for /e says just:

--------------------
 If this modifier is set, preg_replace()  does normal substitution of backreferences in the replacement string, evaluates it as PHP code, and uses the result for replacing the search string. Single and double quotes are escaped by backslashes in substituted backreferences.
----------------------

Given that this is a really awkward potential gotcha, especially when it interacts with PHP's magic quotes and SQL, I think it is worth stressing that: 

a)If the backreference is single quoted, eg:
   ('\\1')
then 
   i)Double quotes will become escaped by \
   ii)Unescaped single quotes will cause a parse error.

b)If the backreference is double quoted, eg:
   ("\\1")
then 
   i)single quotes will become escaped by \
   ii)Unescaped double quotes will cause a parse error.

c)If the source is user-input which has had magic-quotes applied, then 
  quotes of type (i) will end up doubly escaped causing an SQL error, and one of the escapes must be removed
  quotes of type (ii) will lose their magic quoting, and need to have the escape restored.
 [2005-04-05 16:59 UTC] vrana@php.net
Unescaped quotes doesn't cause parse error thanks to escaping provided by /e.

<?php
echo preg_replace('~.*~e', '"\\0"', '"'); // ", no parse error
?>

I'm against messing this part with magic_qutes and SQL injection issues.
 [2005-04-05 17:47 UTC] php at richardneill dot org
Sorry - I'm not thinking clearly today - need more coffee! Anyway, the parts (ii) in my previous comment are complete nonsense, but the parts (i) are true. 

I still think that the documentation is wrong: it reads:

"Single AND double quotes are escaped by backslashes in substituted backreferences" 

whereas it should read:

"In single-quoted backreferences, single-quotes are escaped by backslashes; double quotes are not escaped". The reverse applies for double-quoted backreferences. 

------------------

I also think that some sort of warning is important here (even if it's just a link to another page). This is necessary because a double escaped quote becomes an SQL injection issue. Eg:

User writes:   
    Here's a test.
After magic quoting:     
    Here\'s a test.
After preg_replace using ("\\1")
    Here\\'s a test
SQL: 
    $sql="UPDATE table SET value='$input'";
Database query is:
    UPDATE table SET value='Here\\'s a test'
which is parsed as literal \ followed by unescaped '
Which will fail.     

Thus the user thinks that they are always safe because of magic-quotes, but in fact they are NOT.
 [2005-04-05 19:28 UTC] vrana@php.net
The code is escaped, then executed. Read again the post from 5 Apr 3:46pm CEST.
 [2005-04-06 00:41 UTC] php at richardneill dot org
Sorry if I'm being daft here, but my reading of the documentation is that both single and double quotes are always escaped. 

However, as per your post:

modify('\\1') ....resulting in
single quote ', a double quote \", and a backslash\

modify("\\1") ... resulting in
single quote \', a double quote ", and a backslash\

My reading of this is that the first time (single quoted backreference), only the double quotes, but not the single quotes are escaped. The reverse is true the second time.

Perhaps a better sentence for the documentation would be
"Single XOR double quotes are escaped by backslashes in substituted backreferences"

I apologise for getting them the wrong way round earlier!
My main point isn't that it works wrongly, but that the behaviour may be confusing, and that especially for less wizardly programmers such as myself, it would be helpful if it were spelled out a little more. I made a mistake, which I was lucky to catch before I received an SQL injection attack, and that's why I hope that other people will at least have more warning before they err similarly.

Best wishes, 
Richard
 [2005-04-13 09:55 UTC] vrana@php.net
Content of ellipsis in my post is important for you to understand it:
modify("\\1") ... resulting in
 [2005-04-14 21:08 UTC] php at richardneill dot org
I'm sorry - but I think I did understand what you wrote. 
Perhaps the fact that you've needed to explain this to me does indicate that the documentation here could be improved. I do still believe that 2 things would be beneficial:

ii)Clarify this sentence : "Single and double quotes are escaped by backslashes in substituted backreferences." 
[on manual/en/reference.pcre.pattern.modifiers.php]

From this sentence, it is not clear, to me, at any rate, that: 
modify("\\1")  and modify('\\1')
will result in *different* outputs.


ii)A warning (somewhere) that this could potentially permit an SQL injection attack, and the user ought to beware.

I do apologise for bugging you about this: I'm sure you have more important things to worry about!  If you'd prefer, I'll write something about it in the user contributed notes.

Best wishes

Richard
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Mon May 06 16:01:33 2024 UTC