php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #41739 Escaped backslash in PCRE regex documentation
Submitted: 2007-06-19 17:34 UTC Modified: 2007-08-17 18:28 UTC
From: trex0003 at umn dot edu Assigned:
Status: Not a bug Package: Documentation problem
PHP Version: Irrelevant OS: CentOS
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: trex0003 at umn dot edu
New email:
PHP Version: OS:

 

 [2007-06-19 17:34 UTC] trex0003 at umn dot edu
Description:
------------
Documentation of how PCRE regex behaves with escaped backslashes is confusing as written in current documentation:

http://php.planetmirror.com/manual/en/reference.pcre.pattern.syntax.php

Perhaps this just means an expansion of the note:

"Note:  Single and double quoted PHP strings have special meaning of backslash. Thus if \ has to be matched with a regular expression \\, then "\\\\" or '\\\\' must be used in PHP code."

While trying to use preg_split to break a string at non-escaped string quotes, I tried using a negative lookbehind assertion (see code below). With all other escaped characters in the lookbehind assertion, you only need one backwards slash before the escaped character (e.g., "(?<!\s)" works); but if you want to escape a backslash, you need to escape it twice (e.g., "(?<!\\\)" works but "(?<!\\)" does not). 

Based on the documentation's statement 'In particular, if you want to match a backslash, you write "\\".' it seems that (?<!\\) should work, and the extra information in the Note is too terse to explain why.

Perhaps this is actually a bug, but I can't tell if it's normal behavior that could be clarified more in documentation.

The closest I could find for bug reports to this issue is here:

http://bugs.php.net/bug.php?id=22315

...but this only implies the difference in behavior without explaining why.

Reproduce code:
---------------
$string = "This sentence is not quoted. 'But this one is, and it contains \'escaped quotes\' within!'";

$needle = "/(?<!\\)'/";
$results = preg_split($needle, $string);

print_r($results);

Expected result:
----------------
Array
(
    [0] => This sentence is not quoted. 
    [1] => But this one is, and it contains \'escaped quotes\' within!
    [2] => 
)

// Adding a third backslash to the $needle gives this result.

Actual result:
--------------
Warning: preg_split() [function.preg-split]: Compilation failed: missing ) at offset 7 in /data/domains/lawweb3.law.umn.edu/public/php/regex.php on line 16

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2007-08-16 14:21 UTC] vrana@php.net
\ character
\\ PCRE representation of this character
'\\\\' PHP string to write this representation.

 [2007-08-17 17:16 UTC] trex0003 at umn dot edu
Please explain the comments added to this report and how it was determined to be "bogus". The comments added by vrana@php.net to the bug do nothing more than reiterate the use of a single, double and quadruple backslash. My question is about why a triple backslash is necessary in this instance.

The case of four backslashes in PHP makes sense -- the first and third backslashes escape the second and fourth. But when you have three backslashes in a row, doesn't the third backslash act as an escape character to whatever follows? In my example of:

  $needle = "/(?<!\\\)'/";

I would expect that the first backslash would escape the second, and the third would escape the right parenthesis, but that's apparently not what happens, because this particular code WORKS.
 [2007-08-17 18:28 UTC] vrana@php.net
"if you try to escape any other character, the backslash will be printed too" in strings documentation.
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Fri Jul 04 15:01:36 2025 UTC