php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #81537 Error in regular expression using negation.
Submitted: 2021-10-18 08:32 UTC Modified: 2021-12-01 15:11 UTC
Votes:2
Avg. Score:4.5 ± 0.5
Reproduced:2 of 2 (100.0%)
Same Version:0 (0.0%)
Same OS:0 (0.0%)
From: jansverre at uffe dot no Assigned:
Status: Verified Package: PCRE related
PHP Version: 8.0.11 OS: Ubuntu Linux 20.04
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: jansverre at uffe dot no
New email:
PHP Version: OS:

 

 [2021-10-18 08:32 UTC] jansverre at uffe dot no
Description:
------------
Regular expression that *should* match a text does not match it. In the code sample, the second preg_match will match the link in the string, but not the first, and the only difference is that the input string contains an extra 'a' letter.

Test script:
---------------
$regex = '/(^|[\n\s\>\\";]+)([^ \,"\t\n\r<>;]+\.[^ \,\"\t\n\r<]+)/iu';

$text = 'This is a normal text. http://vg.no, VG';
preg_match_all($regex, $text, $matches);
print_r($matches);

$text = 'This is normal text. http://vg.no, VG';
preg_match_all($regex, $text, $matches);
print_r($matches);

Expected result:
----------------
Both $matches should contain the link http://vg.no

Actual result:
--------------
Only the last $matches contains the link.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2021-10-18 08:43 UTC] cmb@php.net
-Status: Open +Status: Verified
 [2021-10-18 08:43 UTC] cmb@php.net
This is a PCRE JIT issue (<https://3v4l.org/0Em35> vs.
<https://3v4l.org/I8BQ4>).
 [2021-12-01 15:11 UTC] cmb@php.net
This is apparently fixed in PCRE2 10.39 (or maybe earlier),
bundled as of PHP 8.1.1.  I'm not sure whether we should update
PHP-8.0 to this PCRE2 version.
 [2022-01-04 22:08 UTC] sebastiaodomingos05 at gmail dot com
As I can see, it is rather related to an extra space after adding the letter 'a'.
 [2022-09-21 08:24 UTC] antonio at softcodex dot ch
Not sure if it's the same issue.
I have this regex (see below) that doesn't work on 

PHP 7.3.33:
 7.3.33-6+ubuntu20.04.1+deb.sury.org+1 
but WILL work on the windows binary and "php:7.3-apache" docker image

PHP 8.1.10: only tested the windows binary 

preg_match( '/<(\w+)[\s\w\-]+ id="S44_i89ew">/', '<br><div id="S44_i89ew">' , $matches );
expected result: 
array(2) { 
[0]=> string(20) "<div id="S44_i89ew">" [1]=> string(3) "di" 
}

actial result:
array(0) { }

The strangely if you change the + quantifier after [\s\w\-] to * or ? it works.
 [2023-01-11 06:29 UTC] MadeleineAshton at jourrapide dot com
Similarly, the negation variant of the character class is defined as "[^ ]" (with ^ within the square braces), it matches a single character which is not in the specified or set of possible characters. For example the regular expression [^abc] matches a single character except a or, b or, c.

<https://www.myhtspace.org/>github.com
 [2023-01-23 08:28 UTC] ygwgs79hvh at gmail dot com
The problem is that in regular expressions the negation must go inside square brackets. That is not right. Square brackets are used to match a class of characters (even when negated with a ^), which match a single character.


<https://www.employeeconnection.one/>github.com
 [2023-08-09 14:34 UTC] jouni dot makelainen at gmail dot com
I have similar issue with this (simplified) regexp. I tested this with PHP docker images. It works in 7.4-cli and 8.0-cli, but not in 8.1-cli (PCRE 10.39) or 8.2-cli (PCRE 10.40). 

Also works if negation is replaced with whitespace. 

Test script:
------------
echo preg_match('/(<\w+[^>]+href>)/', '<li><a href>');

Excepted result:
----------------
1

Actual result:
--------------
0
 [2024-05-17 08:40 UTC] robert2003blodgett at outlook dot com
(https://github.com)(https://www.mycoveredcalifornia.com)
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 18:01:29 2024 UTC