php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #41739 Escaped backslash in PCRE regex documentation
Submitted: 2007-06-19 17:34 UTC Modified: 2007-08-17 18:28 UTC
From: trex0003 at umn dot edu Assigned:
Status: Not a bug Package: Documentation problem
PHP Version: Irrelevant OS: CentOS
Private report: No CVE-ID: None
 [2007-06-19 17:34 UTC] trex0003 at umn dot edu
Description:
------------
Documentation of how PCRE regex behaves with escaped backslashes is confusing as written in current documentation:

http://php.planetmirror.com/manual/en/reference.pcre.pattern.syntax.php

Perhaps this just means an expansion of the note:

"Note:  Single and double quoted PHP strings have special meaning of backslash. Thus if \ has to be matched with a regular expression \\, then "\\\\" or '\\\\' must be used in PHP code."

While trying to use preg_split to break a string at non-escaped string quotes, I tried using a negative lookbehind assertion (see code below). With all other escaped characters in the lookbehind assertion, you only need one backwards slash before the escaped character (e.g., "(?<!\s)" works); but if you want to escape a backslash, you need to escape it twice (e.g., "(?<!\\\)" works but "(?<!\\)" does not). 

Based on the documentation's statement 'In particular, if you want to match a backslash, you write "\\".' it seems that (?<!\\) should work, and the extra information in the Note is too terse to explain why.

Perhaps this is actually a bug, but I can't tell if it's normal behavior that could be clarified more in documentation.

The closest I could find for bug reports to this issue is here:

http://bugs.php.net/bug.php?id=22315

...but this only implies the difference in behavior without explaining why.

Reproduce code:
---------------
$string = "This sentence is not quoted. 'But this one is, and it contains \'escaped quotes\' within!'";

$needle = "/(?<!\\)'/";
$results = preg_split($needle, $string);

print_r($results);

Expected result:
----------------
Array
(
    [0] => This sentence is not quoted. 
    [1] => But this one is, and it contains \'escaped quotes\' within!
    [2] => 
)

// Adding a third backslash to the $needle gives this result.

Actual result:
--------------
Warning: preg_split() [function.preg-split]: Compilation failed: missing ) at offset 7 in /data/domains/lawweb3.law.umn.edu/public/php/regex.php on line 16

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2007-08-16 14:21 UTC] vrana@php.net
\ character
\\ PCRE representation of this character
'\\\\' PHP string to write this representation.

 [2007-08-17 17:16 UTC] trex0003 at umn dot edu
Please explain the comments added to this report and how it was determined to be "bogus". The comments added by vrana@php.net to the bug do nothing more than reiterate the use of a single, double and quadruple backslash. My question is about why a triple backslash is necessary in this instance.

The case of four backslashes in PHP makes sense -- the first and third backslashes escape the second and fourth. But when you have three backslashes in a row, doesn't the third backslash act as an escape character to whatever follows? In my example of:

  $needle = "/(?<!\\\)'/";

I would expect that the first backslash would escape the second, and the third would escape the right parenthesis, but that's apparently not what happens, because this particular code WORKS.
 [2007-08-17 18:28 UTC] vrana@php.net
"if you try to escape any other character, the backslash will be printed too" in strings documentation.
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Fri Jul 04 09:01:34 2025 UTC