php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #77193 Infinite loop in preg_replace_callback
Submitted: 2018-11-23 13:55 UTC Modified: 2018-12-01 17:30 UTC
From: mlocati at gmail dot com Assigned: ab (profile)
Status: Closed Package: PCRE related
PHP Version: 7.3.0RC6 OS:
Private report: No CVE-ID: None
 [2018-11-23 13:55 UTC] mlocati at gmail dot com
Description:
------------
While testing the develop branch of concrete5 for PHP 7.3 compatibility, I've seen that the code at https://github.com/concrete5/concrete5/blob/27bcd1b36ef35f21b0df32bfd7e57d2ecf347d65/concrete/src/Editor/LinkAbstractor.php#L97-L109 loops forever.

Here's a copy of the code:

$text = preg_replace_callback(
    '/{CCM:CID_([0-9]+)}/i',
    function ($matches) {
        $cID = $matches[1];
        if ($cID > 0) {
            $c = Page::getByID($cID, 'ACTIVE');
            if ($c->isActive()) {
                return (string) \URL::to($c);
            }
        }
    },
    $text
);

If I comment the "return (string) \URL::to($c);" line, everything works fine, but with that line the closure is called an infinite number of times.

I tried hard but I really can't replicate this issue in a simpler test case.

I tested it both on Windows (PHP 7.3.0RC6 32bit TS) and on the php:7.3.0RC5-cli-stretch docker image (sorry, no RC6 available yet).

To replicate the problem in the same environment as mine, you can run the php:7.3.0RC5-cli-stretch docker image ("docker run --rm -it php:7.3.0RC5-cli-stretch bash") and call these shell commands:

# Update the APT repository & install required packages
apt update && apt install -y mysql-server git unzip

# Setup MySQL/MariaDB
service mysql start
mysql -e "CREATE USER 'travis'@'localhost' IDENTIFIED BY ''; GRANT ALL PRIVILEGES ON * . * TO 'travis'@'localhost'; FLUSH PRIVILEGES;"

# Build and enable GD and PDO_MYSQL
curl https://raw.githubusercontent.com/mlocati/docker-php-extension-installer/master/install-php-extensions -o /usr/local/bin/install-php-extensions
chmod +x /usr/local/bin/install-php-extensions
install-php-extensions gd pdo_mysql

# Install Composer
php -r "copy('https://getcomposer.org/installer', 'composer-setup.php');"
php -r "if (hash_file('sha384', 'composer-setup.php') === '93b54496392c062774670ac18b134c3b3a95e5a5e5c8f1a9f115f203b75bf9a129d5daa8ba6a13e2cc8a1da0806388a8') { echo 'Installer verified'; } else { echo 'Installer corrupt'; unlink('composer-setup.php'); } echo PHP_EOL;"
php composer-setup.php
php -r "unlink('composer-setup.php');"
chmod a+x composer.phar
mv composer.phar /usr/local/bin/composer

# Clone repository
git clone --depth=1 https://github.com/concrete5/concrete5.git
cd concrete5

# Patch required for making the tests work
sed -i -e 's/localhost/127.0.0.1/g' tests/config/database.php

# Install composer dependencies
composer install

# Execute the test
composer test -- --filter=ContentPageTranslateTest::testFrom



The test will last forever (well, actually until PHP goes out of memory). On previous PHP versions everything works fine.


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2018-11-25 12:37 UTC] cmb@php.net
Does this happen regardless of the value of pcre.jit?  Also, does
this happen if you actually return an empty string from the
callback instead of the implicit NULL?
 [2018-11-25 19:21 UTC] mlocati at gmail dot com
I tried both with
pcre.jit=0
and with
pcre.jit=1

in php.ini.

The infinite loop also occurs when I remove the "return" instruction from "return (string) \URL::to($c);", and always returning an empty string, that is:

$text = preg_replace_callback(
    '/{CCM:CID_([0-9]+)}/i',
    function ($matches) {
        $cID = $matches[1];
        if ($cID > 0) {
            $c = Page::getByID($cID, 'ACTIVE');
            if ($c->isActive()) {
                (string) \URL::to($c);
            }
        }
        return '';
    },
    $text
);
 [2018-11-27 15:05 UTC] ab@php.net
This doesn't look like a PCRE issue. URL::to() seems to block the callback, so then it never returns. The actual issue should be somewhere later in the call stack starting from URL::to(), something like an endless loop or unterminated recursive call.

I'm going to check the suggested repro way with the docker container next days otherwise.

Thanks.
 [2018-11-27 15:56 UTC] mlocati at gmail dot com
> This doesn't look like a PCRE issue. URL::to() seems to block the callback,
> so then it never returns. The actual issue should be somewhere later in
> the call stack starting from URL::to(), something like an endless loop or
> unterminated recursive call.

I modified the code a bit to show what's happening.

With this code:

ob_end_clean(); // Required because PHPUnit enables output buffering
echo 'Method: ', __METHOD__, "\n";
echo 'Text: ', $text, "\n";
$text = preg_replace_callback(
    '/{CCM:CID_([0-9]+)}/i',
    function ($matches) {
        print_r($matches);
        $cID = $matches[1];
        if ($cID > 0) {
            $c = Page::getByID($cID, 'ACTIVE');
            if ($c->isActive()) {
                return (string) \URL::to($c);
            }
        }
    },
    $text
);


The output is the following:

Method: Concrete\Core\Editor\LinkAbstractor::translateFrom
Text: <a href="{CCM:CID_3}">Super Cool!</a>
Array
(
    [0] => {CCM:CID_3}
    [1] => 3
)
Array
(
    [0] => {CCM:CID_3}
    [1] => 3
)
Array
(
    [0] => {CCM:CID_3}
    [1] => 3
)
[...endless list of Array()...]

As you can see, the \URL::to() call is not blocking the execution, but the callback is called forever even if we have just one match.
 [2018-11-28 20:32 UTC] ab@php.net
-Status: Open +Status: Feedback
 [2018-11-28 20:32 UTC] ab@php.net
@mlocati thanks, with the docker way reproduced it as well.

I still can't repro it on a reduced snippet. I guess, the method call inside the callback somehow resets the match context (so somewhere in the call trace is a PCRE API call), thus preventing loop termination. Could you please check, whether this fixes it?

diff --git a/ext/pcre/php_pcre.c b/ext/pcre/php_pcre.c
index 5165209b85..f8fc721d81 100644
--- a/ext/pcre/php_pcre.c
+++ b/ext/pcre/php_pcre.c
@@ -1880,6 +1880,9 @@ matched:

                        new_len = result_len + offsets[0] - start_offset; /* part before the match */

+                       /* Advance to the next piece. */
+                       start_offset = offsets[1];
+
                        /* Use custom function to get replacement string and its length. */
                        eval_result = preg_do_repl_func(fci, fcc, subject, offsets, subpat_names, count,
                                pcre2_get_mark(match_data));
@@ -1908,9 +1911,6 @@ matched:

                        limit--;

-                       /* Advance to the next piece. */
-                       start_offset = offsets[1];
-
                        /* If we have matched an empty string, mimic what Perl's /g options does.
                           This turns out to be rather cunning. First we set PCRE2_NOTEMPTY_ATSTART and try
                           the match again at the same point. If this fails (picked up above) we
 [2018-11-29 09:04 UTC] mlocati at gmail dot com
-Status: Feedback +Status: Open
 [2018-11-29 09:04 UTC] mlocati at gmail dot com
@ab Yes! That patch fixed the issue:


C:\MyPath>composer test -- --filter=ContentPageTranslateTest::testFrom
> phpunit "--filter=ContentPageTranslateTest::testFrom"
PHPUnit 4.8.36 by Sebastian Bergmann and contributors.

Runtime:        PHP 7.3.0-dev
Configuration:  C:\MyPath\phpunit.xml

.

Time: 4.68 seconds, Memory: 30.00MB

OK (1 test, 1 assertion)


Great job, Anatol!
 [2018-11-29 11:34 UTC] cmb@php.net
I wonder whether we don't have to call pcre2_get_mark() before
calling pcre2_get_ovector_pointer(), i.e. partly reverting commit
b81d712.

I mean something like this instead the patch above:

 ext/pcre/php_pcre.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/ext/pcre/php_pcre.c b/ext/pcre/php_pcre.c
index 5165209b85..4277ffb6aa 100644
--- a/ext/pcre/php_pcre.c
+++ b/ext/pcre/php_pcre.c
@@ -1792,6 +1792,7 @@ static zend_string *php_pcre_replace_func_impl(pcre_cache_entry *pce, zend_strin
 	char			*match,				/* The current match */
 					*piece;				/* The current piece of subject */
 	size_t			result_len; 		/* Length of result */
+	PCRE2_SPTR		mark = NULL;		/* Target for MARK name */
 	zend_string		*result;			/* Result of replacement */
 	zend_string     *eval_result;		/* Result of custom function */
 	pcre2_match_data *match_data;
@@ -1852,6 +1853,8 @@ static zend_string *php_pcre_replace_func_impl(pcre_cache_entry *pce, zend_strin
 	while (1) {
 		piece = subject + start_offset;
 
+		mark = pcre2_get_mark(match_data);
+
 		if (count >= 0 && limit) {
 			/* Check for too many substrings condition. */
 			if (UNEXPECTED(count == 0)) {
@@ -1881,8 +1884,7 @@ matched:
 			new_len = result_len + offsets[0] - start_offset; /* part before the match */
 
 			/* Use custom function to get replacement string and its length. */
-			eval_result = preg_do_repl_func(fci, fcc, subject, offsets, subpat_names, count,
-				pcre2_get_mark(match_data));
+			eval_result = preg_do_repl_func(fci, fcc, subject, offsets, subpat_names, count, mark);
 
 			ZEND_ASSERT(eval_result);
 			new_len = zend_safe_address_guarded(1, ZSTR_LEN(eval_result), new_len);


[1] <http://git.php.net/?p=php-src.git;a=commit;h=b81d712961aa3cbc64dc8bb521f2427cf443e550>
 [2018-11-29 13:33 UTC] mlocati at gmail dot com
@cmb I just tried your patch instead of the @ab one, but with it we still have the infinite loop...
 [2018-11-29 13:39 UTC] mlocati at gmail dot com
Just for the records, I'm compiling a 32-bit PHP under Windows with

configure --disable-all --with-all-shared --enable-cli --enable-phar --enable-json --enable-filter --with-iconv --enable-pdo --with-mysqli --with-mysqlnd --with-pdo-mysql --with-openssl --with-simplexml --with-libxml --with-gd --enable-mbstring --enable-tokenizer --with-dom --with-xml --enable-xmlreader --enable-xmlwriter --enable-hash  --enable-fileinfo --enable-session --enable-zip

nmake
 [2018-11-29 22:54 UTC] cmb@php.net
-Status: Open +Status: Assigned -Assigned To: +Assigned To: ab
 [2018-11-29 22:54 UTC] cmb@php.net
Thanks for testing, Michele!  Then we should go with Anatol's
patch.  Not sure if it should go into PHP-7.3.0.
 [2018-11-30 15:07 UTC] ab@php.net
Thanks for testing, Michele.

Chistoph, I'll be preparing a patch then. Still lack on a simpler repro case :/

Thanks
 [2018-11-30 16:41 UTC] mlocati at gmail dot com
It took me quite some time to keep removing stuff to get to a test script as simple as possible, and here it is:

<?php
$text = '{CCM:CID_2}';
echo '1';
preg_replace_callback(
    '/([0-9]+)/i',
    function ($matches) {
        echo $matches[1];
        filter_var('http', FILTER_VALIDATE_REGEXP, ['options' => ['regexp' => '/^http$/i']]);
    },
    $text
);
echo '3';


It should print '123', but it prints '1' and an infinite number of '2'.

If you set $text to just a digit (eg '2'), just '12' is printed out, and PHP quits with an error level -1073741819 (I'm using a 32-bit PHP).
 [2018-11-30 16:46 UTC] mlocati at gmail dot com
PS: I'm on Windows, so -1073741819 is the 0xC0000005 (STATUS_ACCESS_VIOLATION) system error
 [2018-11-30 16:54 UTC] requinix@php.net
What happens with this?

<?php
$text = '{CCM:CID_2}';
echo '1';
preg_replace_callback(
    '/([0-9]+)/i',
    function ($matches) {
        echo $matches[1];
        preg_match('/^http$/i', '');
    },
    $text
);
echo '3';
 [2018-11-30 21:56 UTC] ab@php.net
@mlocati, excellent! I'll add it as a test.

@requinix your snippet doesn't repro, unfortunately.

Thanks.
 [2018-11-30 22:42 UTC] requinix@php.net
> @requinix your snippet doesn't repro, unfortunately.
That was to test if the problem was with the PCRE2 code in general or if it was specifically about how ext/filter and FILTER_VALIDATE_REGEXP uses it.
 [2018-12-01 09:29 UTC] ab@php.net
Automatic comment on behalf of ab
Revision: http://git.php.net/?p=php-src.git;a=commit;h=ef1269d5c158afaf052e3d79591462e0f5372b1a
Log: Fixed bug #77193 Infinite loop in preg_replace_callback
 [2018-12-01 09:29 UTC] ab@php.net
-Status: Assigned +Status: Closed
 [2018-12-01 10:17 UTC] mlocati at gmail dot com
Will this fix be include in PHP 7.3.0?
 [2018-12-01 13:56 UTC] cmb@php.net
> Will this fix be include in PHP 7.3.0?

Good question!  Anatol, what do you think?
 [2018-12-01 16:17 UTC] ab@php.net
AFM this should be handled as a usual bug and the fix should go through QA. I woludn't expect any complications on this, still it's unfortunate it has been caught this late. It's anyway at your convenience, Christoph, I might not see the whole picture.

Thanks.
 [2018-12-01 17:30 UTC] cmb@php.net
Thanks, Anatol.  Then I agree that we better ship this as of
7.3.1, albeit it's a regression from 7.2.  After all, 7.3.1 is
just a month away.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Mar 19 05:01:29 2024 UTC