php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #79241 Segmentation fault on preg_match()
Submitted: 2020-02-07 15:10 UTC Modified: 2020-02-11 11:19 UTC
From: ivan dot voskoboinyk at gmail dot com Assigned: cmb (profile)
Status: Closed Package: PCRE related
PHP Version: 7.4.2 OS: Ubuntu 19.04
Private report: No CVE-ID: None
View Add Comment Developer Edit
Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know!
Just going to say 'Me too!'? Don't clutter the database with that please !
Your email address:
MUST BE VALID
Solve the problem:
27 - 19 = ?
Subscribe to this entry?

 
 [2020-02-07 15:10 UTC] ivan dot voskoboinyk at gmail dot com
Description:
------------
We had this bug on our production server happening on matching Regex on a unicode string decoded from JSON.

I've taken the exact production code and removed everything that is not necessary for demonstrating the problem. Removing any other part from the snippet bellow fixes the crash.

It seems the issue only appeared in PHP 7.4: https://3v4l.org/DqL76

Test script:
---------------
// if "’" string is used directly without json_decode, 
// the issue does not reproduce
$text = json_decode('"’"'); 

$pattern = '/\b/u';

// it has to be exact two calls to preg_match(), 
// with the second call offsetting after the tick symbol
preg_match($pattern, $text, $matches, 0, 0);
preg_match($pattern, $text, $matches, 0, 1);

echo 'OK';

Expected result:
----------------
"OK" is output

Actual result:
--------------
Segmentation fault (core dumped)

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2020-02-07 15:16 UTC] nikic@php.net
-Status: Open +Status: Verified
 [2020-02-07 15:19 UTC] nikic@php.net
-Status: Verified +Status: Analyzed
 [2020-02-07 15:19 UTC] nikic@php.net
Pretty sure this is due to the VALID_UTF8 optimization. We remember that the string is valid UTF-8, but we also need to verify that the offset is at a valid character boundary.
 [2020-02-11 11:19 UTC] cmb@php.net
-Status: Analyzed +Status: Closed -Assigned To: +Assigned To: cmb
 
PHP Copyright © 2001-2021 The PHP Group
All rights reserved.
Last updated: Sun Sep 19 15:03:37 2021 UTC