php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #76188 Change in treating unescaped - in character classes
Submitted: 2018-04-05 15:18 UTC Modified: 2018-04-05 21:44 UTC
From: andi at splitbrain dot org Assigned:
Status: Not a bug Package: PCRE related
PHP Version: master-Git-2018-04-05 (Git) OS: Linux
Private report: No CVE-ID: None
 [2018-04-05 15:18 UTC] andi at splitbrain dot org
Description:
------------
In regular expressions an unescaped minus character (-) in character classes is usually used to declare a range. However when the range makes no sense, it is treated as a literal minus character. This seems still to be true for most occasions in 7.3-dev except for one special combination, where the minus sits between a shortcut character class and a dollar sign. Eg. /[\w-$]/. Previous PHP versions would treat this as word characters, minus and dollar. PHP 7.3-dev throws an error.

Test script:
---------------
<?php                                                                           
                                                                                
$tests = [                                                                      
    '/[\w-$]/',                                                                 
    '/[\w-]/',                                                                  
    '/[-\w]/',                                                                  
    '/[$-]/',                                                                   
    '/[-$]/',                                                                   
];                                                                              
                                                                                
echo PHP_VERSION;                                                               
echo "\n";                                                                      
                                                                                
foreach($tests as $test) {                                                      
    echo "$test\n";                                                             
    preg_match($test,'');                                                       
}

Expected result:
----------------
7.2.3
/[\w-$]/
/[\w-]/
/[-\w]/
/[$-]/
/[-$]/

Actual result:
--------------
7.3.0-dev
/[\w-$]/

Warning: preg_match(): Compilation failed: invalid range in character class at offset 3 in /var/www/test.php on line 16
/[\w-]/
/[-\w]/
/[$-]/
/[-$]/

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2018-04-05 16:48 UTC] cmb@php.net
That's likely a behavioral difference between PCRE and PCRE2.
 [2018-04-05 21:44 UTC] requinix@php.net
-Status: Open +Status: Not a bug
 [2018-04-05 21:44 UTC] requinix@php.net
It's PCRE2.

In PCRE,
> An error is generated if a POSIX character class (see below) or an escape sequence other than one that defines a
> single character appears at a point where a range ending character is expected. For example, [z-\xff] is valid, but
> [A-\d] and [A-[:digit:]] are not.
https://www.pcre.org/original/doc/html/pcrepattern.html#SEC9

Note that only applied for the range end, meaning [\d-A] was allowed. https://3v4l.org/LC0hG
If you update $tests to include [$-\w] then you'll see the error. https://3v4l.org/fk0Ij

In PCRE2,
> Perl treats a hyphen as a literal if it appears before or after a POSIX class (see below) or before or after a
> character type escape such as as \d or \H. However, unless the hyphen is the last character in the class, Perl
> outputs a warning in its warning mode, as this is most likely a user error. As PCRE2 has no facility for warning,
> an error is given in these cases.
https://www.pcre.org/current/doc/html/pcre2pattern.html#SEC9

It's now an error to have it both before or after the hyphen.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Nov 22 23:01:29 2024 UTC