php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #76188 Change in treating unescaped - in character classes
Submitted: 2018-04-05 15:18 UTC Modified: 2018-04-05 21:44 UTC
From: andi at splitbrain dot org Assigned:
Status: Not a bug Package: PCRE related
PHP Version: master-Git-2018-04-05 (Git) OS: Linux
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: andi at splitbrain dot org
New email:
PHP Version: OS:

 

 [2018-04-05 15:18 UTC] andi at splitbrain dot org
Description:
------------
In regular expressions an unescaped minus character (-) in character classes is usually used to declare a range. However when the range makes no sense, it is treated as a literal minus character. This seems still to be true for most occasions in 7.3-dev except for one special combination, where the minus sits between a shortcut character class and a dollar sign. Eg. /[\w-$]/. Previous PHP versions would treat this as word characters, minus and dollar. PHP 7.3-dev throws an error.

Test script:
---------------
<?php                                                                           
                                                                                
$tests = [                                                                      
    '/[\w-$]/',                                                                 
    '/[\w-]/',                                                                  
    '/[-\w]/',                                                                  
    '/[$-]/',                                                                   
    '/[-$]/',                                                                   
];                                                                              
                                                                                
echo PHP_VERSION;                                                               
echo "\n";                                                                      
                                                                                
foreach($tests as $test) {                                                      
    echo "$test\n";                                                             
    preg_match($test,'');                                                       
}

Expected result:
----------------
7.2.3
/[\w-$]/
/[\w-]/
/[-\w]/
/[$-]/
/[-$]/

Actual result:
--------------
7.3.0-dev
/[\w-$]/

Warning: preg_match(): Compilation failed: invalid range in character class at offset 3 in /var/www/test.php on line 16
/[\w-]/
/[-\w]/
/[$-]/
/[-$]/

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2018-04-05 16:48 UTC] cmb@php.net
That's likely a behavioral difference between PCRE and PCRE2.
 [2018-04-05 21:44 UTC] requinix@php.net
-Status: Open +Status: Not a bug
 [2018-04-05 21:44 UTC] requinix@php.net
It's PCRE2.

In PCRE,
> An error is generated if a POSIX character class (see below) or an escape sequence other than one that defines a
> single character appears at a point where a range ending character is expected. For example, [z-\xff] is valid, but
> [A-\d] and [A-[:digit:]] are not.
https://www.pcre.org/original/doc/html/pcrepattern.html#SEC9

Note that only applied for the range end, meaning [\d-A] was allowed. https://3v4l.org/LC0hG
If you update $tests to include [$-\w] then you'll see the error. https://3v4l.org/fk0Ij

In PCRE2,
> Perl treats a hyphen as a literal if it appears before or after a POSIX class (see below) or before or after a
> character type escape such as as \d or \H. However, unless the hyphen is the last character in the class, Perl
> outputs a warning in its warning mode, as this is most likely a user error. As PCRE2 has no facility for warning,
> an error is given in these cases.
https://www.pcre.org/current/doc/html/pcre2pattern.html#SEC9

It's now an error to have it both before or after the hyphen.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sun Oct 27 18:01:28 2024 UTC