php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #62360 Five valid PCRE escape sequences not documented
Submitted: 2012-06-19 01:28 UTC Modified: 2016-12-31 00:44 UTC
Votes:8
Avg. Score:4.0 ± 1.0
Reproduced:3 of 5 (60.0%)
Same Version:2 (66.7%)
Same OS:2 (66.7%)
From: danielklein at airpost dot net Assigned:
Status: Open Package: PCRE related
PHP Version: Irrelevant OS:
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: danielklein at airpost dot net
New email:
PHP Version: OS:

 

 [2012-06-19 01:28 UTC] danielklein at airpost dot net
Description:
------------
---
From manual page: http://www.php.net/regexp.reference.escape
---
There are currently five valid PCRE escape sequences not documented on this page: \C, \R, \X, \g & \k. Some of these are documented on other pages.

\g & \k - regexp.reference.back-references
\C - regexp.reference.dot
\R matches line break characters or combinations (see http://nikic.github.com/2011/12/10/PCRE-and-newlines.html)
\X matches Unicode graphemes (see http://www.regular-expressions.info/unicode.html)

Also, the description for \G is misleading. \G can also match in preg_match_all or preg_replace at the point where the previous match stopped (see example script).

Test script:
---------------
preg_match_all('/\b([^\Wv]+)\s+/X', "In the beginning the universe was created", $matches, PREG_PATTERN_ORDER, 1); // Matches four words not containing 'v', then followed by a space
var_export($matches[1]);
print("\n");
preg_match_all('/\G\b([^\Wv]+)\s+/X', "In the beginning the universe was created", $matches, PREG_PATTERN_ORDER, 1); // No matches
var_export($matches[1]);
print("\n");
preg_match_all('/\G\b([^\Wv]+)\s+/X', "In the beginning the universe was created", $matches, PREG_PATTERN_ORDER, 3); // Able to match \b at offset 3, then two others. No word can have 'v' in it. \G prevents skipping to next word
var_export($matches[1]);


Actual result:
--------------
array (
  0 => 'the',
  1 => 'beginning',
  2 => 'the',
  3 => 'was',
)
array (
)
array (
  0 => 'the',
  1 => 'beginning',
  2 => 'the',
)

Patches

BUGS (last revision 2014-01-04 10:00 UTC by minktee at hotmail dot com)
patch_CHANGES (last revision 2014-01-04 09:42 UTC by minktee at hotmail dot com)
patch_BUGS (last revision 2014-01-04 09:41 UTC by minktee at hotmail dot com)
Patch_banner (last revision 2014-01-04 09:34 UTC by minktee at hotmail dot com)

Pull Requests

Pull requests:

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2012-08-02 03:40 UTC] vovan-ve at yandex dot ru
Also \N is undocumented. It was introduced in PCRE/8.10 (PHP/5.3.4?).

Also some types of conditions in conditional subpattern (?( are undocumented. All available conditions are:

  n
  +n, -n         (PCRE >= 7.2, PHP >= 5.2.4)
  name           (PCRE >= 6.7, PHP >= 5.2.0)
  <name>, 'name' (PCRE >= 7.0, PHP >= 5.2.2)
  R
  Rn             (PCRE >= 7.0, PHP >= 5.2.2)
  R&name         (PCRE >= 7.0, PHP >= 5.2.2)
  ?assertion
  DEFINE         (PCRE >= 7.0, PHP >= 5.2.2)

Also (?C)/(?Cn) is not documented and not implemented. But it works and does nothing (PCRE >= 4.0).

The subject of the bug is "escape sequences not documented". Should it be renemed to "features not documented"?
 [2012-08-02 03:50 UTC] vovan-ve at yandex dot ru
Also recursive (?+n) and (?-n) are undocumented (PHP >= 5.2.4, PCRE >= 7.2). Back references in form \g<n>, \g<-n>, \g'n', \g'-n' (PHP >= 5.2.7, PCRE >= 7.7) are not mentioned.
 [2014-01-04 07:48 UTC] minktee at hotmail dot com
Document problem
 [2016-12-31 00:44 UTC] cmb@php.net
-Package: Documentation problem +Package: PCRE related
 [2021-02-02 05:59 UTC] jessiesilva1989 at gmail dot com
Perl's regular expressions are described in its own documentation, and regular by starting a pattern string with one of the following five sequences: This escaping action applies whether or not the following character would Such characters are not valid in Unicode strings and so cannot be tested.



https://www.omegle.run/
 [2022-12-26 08:37 UTC] nazi dot farhadi3171 at gmail dot com
I'm extremely intrigued with your post. I desire to get more incredible posts.
(https://www.ariseportal.org/)github.com
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Sep 10 04:01:27 2024 UTC