php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #79241 Segmentation fault on preg_match()
Submitted: 2020-02-07 15:10 UTC Modified: 2020-02-11 11:19 UTC
From: ivan dot voskoboinyk at gmail dot com Assigned: cmb (profile)
Status: Closed Package: PCRE related
PHP Version: 7.4.2 OS: Ubuntu 19.04
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: ivan dot voskoboinyk at gmail dot com
New email:
PHP Version: OS:

 

 [2020-02-07 15:10 UTC] ivan dot voskoboinyk at gmail dot com
Description:
------------
We had this bug on our production server happening on matching Regex on a unicode string decoded from JSON.

I've taken the exact production code and removed everything that is not necessary for demonstrating the problem. Removing any other part from the snippet bellow fixes the crash.

It seems the issue only appeared in PHP 7.4: https://3v4l.org/DqL76

Test script:
---------------
// if "’" string is used directly without json_decode, 
// the issue does not reproduce
$text = json_decode('"’"'); 

$pattern = '/\b/u';

// it has to be exact two calls to preg_match(), 
// with the second call offsetting after the tick symbol
preg_match($pattern, $text, $matches, 0, 0);
preg_match($pattern, $text, $matches, 0, 1);

echo 'OK';

Expected result:
----------------
"OK" is output

Actual result:
--------------
Segmentation fault (core dumped)

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2020-02-07 15:16 UTC] nikic@php.net
-Status: Open +Status: Verified
 [2020-02-07 15:19 UTC] nikic@php.net
-Status: Verified +Status: Analyzed
 [2020-02-07 15:19 UTC] nikic@php.net
Pretty sure this is due to the VALID_UTF8 optimization. We remember that the string is valid UTF-8, but we also need to verify that the offset is at a valid character boundary.
 [2020-02-11 11:19 UTC] cmb@php.net
-Status: Analyzed +Status: Closed -Assigned To: +Assigned To: cmb
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Oct 10 20:01:26 2024 UTC