php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #81537 Error in regular expression using negation.
Submitted: 2021-10-18 08:32 UTC Modified: 2021-12-01 15:11 UTC
Votes:2
Avg. Score:4.5 ± 0.5
Reproduced:2 of 2 (100.0%)
Same Version:0 (0.0%)
Same OS:0 (0.0%)
From: jansverre at uffe dot no Assigned:
Status: Verified Package: PCRE related
PHP Version: 8.0.11 OS: Ubuntu Linux 20.04
Private report: No CVE-ID: None
 [2021-10-18 08:32 UTC] jansverre at uffe dot no
Description:
------------
Regular expression that *should* match a text does not match it. In the code sample, the second preg_match will match the link in the string, but not the first, and the only difference is that the input string contains an extra 'a' letter.

Test script:
---------------
$regex = '/(^|[\n\s\>\\";]+)([^ \,"\t\n\r<>;]+\.[^ \,\"\t\n\r<]+)/iu';

$text = 'This is a normal text. http://vg.no, VG';
preg_match_all($regex, $text, $matches);
print_r($matches);

$text = 'This is normal text. http://vg.no, VG';
preg_match_all($regex, $text, $matches);
print_r($matches);

Expected result:
----------------
Both $matches should contain the link http://vg.no

Actual result:
--------------
Only the last $matches contains the link.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2021-10-18 08:43 UTC] cmb@php.net
-Status: Open +Status: Verified
 [2021-10-18 08:43 UTC] cmb@php.net
This is a PCRE JIT issue (<https://3v4l.org/0Em35> vs.
<https://3v4l.org/I8BQ4>).
 [2021-12-01 15:11 UTC] cmb@php.net
This is apparently fixed in PCRE2 10.39 (or maybe earlier),
bundled as of PHP 8.1.1.  I'm not sure whether we should update
PHP-8.0 to this PCRE2 version.
 [2022-01-04 22:08 UTC] sebastiaodomingos05 at gmail dot com
As I can see, it is rather related to an extra space after adding the letter 'a'.
 [2022-09-21 08:24 UTC] antonio at softcodex dot ch
Not sure if it's the same issue.
I have this regex (see below) that doesn't work on 

PHP 7.3.33:
 7.3.33-6+ubuntu20.04.1+deb.sury.org+1 
but WILL work on the windows binary and "php:7.3-apache" docker image

PHP 8.1.10: only tested the windows binary 

preg_match( '/<(\w+)[\s\w\-]+ id="S44_i89ew">/', '<br><div id="S44_i89ew">' , $matches );
expected result: 
array(2) { 
[0]=> string(20) "<div id="S44_i89ew">" [1]=> string(3) "di" 
}

actial result:
array(0) { }

The strangely if you change the + quantifier after [\s\w\-] to * or ? it works.
 [2023-01-11 06:29 UTC] MadeleineAshton at jourrapide dot com
Similarly, the negation variant of the character class is defined as "[^ ]" (with ^ within the square braces), it matches a single character which is not in the specified or set of possible characters. For example the regular expression [^abc] matches a single character except a or, b or, c.

<https://www.myhtspace.org/>github.com
 [2023-01-23 08:28 UTC] ygwgs79hvh at gmail dot com
The problem is that in regular expressions the negation must go inside square brackets. That is not right. Square brackets are used to match a class of characters (even when negated with a ^), which match a single character.


<https://www.employeeconnection.one/>github.com
 [2023-08-09 14:34 UTC] jouni dot makelainen at gmail dot com
I have similar issue with this (simplified) regexp. I tested this with PHP docker images. It works in 7.4-cli and 8.0-cli, but not in 8.1-cli (PCRE 10.39) or 8.2-cli (PCRE 10.40). 

Also works if negation is replaced with whitespace. 

Test script:
------------
echo preg_match('/(<\w+[^>]+href>)/', '<li><a href>');

Excepted result:
----------------
1

Actual result:
--------------
0
 [2024-05-17 08:40 UTC] robert2003blodgett at outlook dot com
(https://github.com)(https://www.mycoveredcalifornia.com)
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 13:01:29 2024 UTC