php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #79241 Segmentation fault on preg_match()
Submitted: 2020-02-07 15:10 UTC Modified: 2020-02-11 11:19 UTC
From: ivan dot voskoboinyk at gmail dot com Assigned: cmb (profile)
Status: Closed Package: PCRE related
PHP Version: 7.4.2 OS: Ubuntu 19.04
Private report: No CVE-ID: None
 [2020-02-07 15:10 UTC] ivan dot voskoboinyk at gmail dot com
Description:
------------
We had this bug on our production server happening on matching Regex on a unicode string decoded from JSON.

I've taken the exact production code and removed everything that is not necessary for demonstrating the problem. Removing any other part from the snippet bellow fixes the crash.

It seems the issue only appeared in PHP 7.4: https://3v4l.org/DqL76

Test script:
---------------
// if "’" string is used directly without json_decode, 
// the issue does not reproduce
$text = json_decode('"’"'); 

$pattern = '/\b/u';

// it has to be exact two calls to preg_match(), 
// with the second call offsetting after the tick symbol
preg_match($pattern, $text, $matches, 0, 0);
preg_match($pattern, $text, $matches, 0, 1);

echo 'OK';

Expected result:
----------------
"OK" is output

Actual result:
--------------
Segmentation fault (core dumped)

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2020-02-07 15:16 UTC] nikic@php.net
-Status: Open +Status: Verified
 [2020-02-07 15:19 UTC] nikic@php.net
-Status: Verified +Status: Analyzed
 [2020-02-07 15:19 UTC] nikic@php.net
Pretty sure this is due to the VALID_UTF8 optimization. We remember that the string is valid UTF-8, but we also need to verify that the offset is at a valid character boundary.
 [2020-02-11 11:19 UTC] cmb@php.net
-Status: Analyzed +Status: Closed -Assigned To: +Assigned To: cmb
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Wed Jan 22 19:01:31 2025 UTC