php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #77334 Issue with preg_replace shorthand character in character class
Submitted: 2018-12-21 16:58 UTC Modified: 2018-12-21 17:34 UTC
From: webdev at westernseminary dot edu Assigned:
Status: Not a bug Package: *Regular Expressions
PHP Version: 7.3.0 OS: Windows Server 2012
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: webdev at westernseminary dot edu
New email:
PHP Version: OS:

 

 [2018-12-21 16:58 UTC] webdev at westernseminary dot edu
Description:
------------
When a shorthand character (\d) is used in a character class ([\d]) along wiht a dash and a dot, the regular expression throws an error in PHP 7.3.0 even though the regular expression is valid. The error does not occur in PHP 7.2.13. 

Test script:
---------------
// SHORT EXAMPLE

echo preg_replace('/[^\d-\.]/', '', '123abc');


// LONG EXAMPLE

$value = '123abc';
$rgxs = array(
	'/[\d]/',
	'/[\d-]/', 
	'/[\d\.]/', 
	'/[\d-\.]/', //bug
	'/[\d\.-]/',
	'/[^\d]/', 
	'/[^\d-]/', 
	'/[^\d\.]/',  
	'/[^\d-\.]/',  //bug
	'/[^\w-\.]/',  //bug
	'/[^0-9-\.]/', 
);

echo '<table border="1" cellpadding="10">';
echo '<tr>';
echo '<th>Index</th>';
echo '<th>Regular Expression</th>';
echo '<th>Result</th>';
echo '<th>Last Error</th>';
echo '</tr>';
foreach($rgxs as $i => $rgx) {
	echo '<tr>';
	echo '<td>'.$i.'</td>';
	echo '<td>'.$rgx.'</td>';
	echo '<td>'.preg_replace($rgx, '', $value).'</td>';
	echo '<td>'.preg_last_error().'</td>';
	echo '</tr>';
}
echo '</table>';
 

Expected result:
----------------
For the short example, the expected result would be "123".

Actual result:
--------------
For the short example, the actual result is NULL.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2018-12-21 17:17 UTC] danack@php.net
You might want to check the error message settings on your code. That code generates an error for me on PHP 7.3.0: https://3v4l.org/NuLqI

Warning: preg_replace(): Compilation failed: invalid range in character class at offset 4 in /in/NuLqI on line 3

Which explains why it's not working. 

This is possibly related to the changed in the preg library used in PHP 7.3, but I'm on a terrible network connection so hard to find the relevant link.


Escaping the hyphen makes it behave how you're expecting it to behave.

echo preg_replace('/[^\d\-\.]/', '', '123-abc');
 [2018-12-21 17:28 UTC] requinix@php.net
-Status: Open +Status: Not a bug
 [2018-12-21 17:28 UTC] requinix@php.net
PHP 7.3 upgraded to PCRE2 which does not allow metacharacters like \d to be used as endpoints of a character range.

https://www.pcre.org/current/doc/html/pcre2pattern.html#SEC9
> Perl treats a hyphen as a literal if it appears before or after a POSIX class
> (see below) or before or after a character type escape such as as \d or \H.
> However, unless the hyphen is the last character in the class, Perl outputs a
> warning in its warning mode, as this is most likely a user error. As PCRE2 has
> no facility for warning, an error is given in these cases.
 [2018-12-21 17:30 UTC] webdev at westernseminary dot edu
-Status: Not a bug +Status: Open
 [2018-12-21 17:30 UTC] webdev at westernseminary dot edu
Yes. However, should it throw an error? Isn't that a valid regular expression? Maybe there is a bug in the new preg library? 

If I swap the shorthand \d with 0-9, it works fine without throwing an error. 

echo preg_replace('/[^0-9-\.]/', '', '123abc');
 [2018-12-21 17:34 UTC] requinix@php.net
-Status: Open +Status: Not a bug
 [2018-12-21 17:34 UTC] requinix@php.net
It's not valid.

The reason for making it an error is because Perl would warn about the situation but PCRE2 isn't able to warn. If you don't like that then you need to file a bug report with them.

[0-9-.] works because the second hyphen is ignored because it works like a [-abc] or [abc-] situation: the hyphen cannot possibly form a character range. In the case of [0-9-.] it's because what came before is also a range and it cannot be "chained" into another range.

But \d does not mean [0-9]. It means [0123456789], plus some others when in Unicode mode. And [0123456789-.] is not what you intended when you wrote [\d-.].
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Sep 17 10:01:29 2024 UTC