php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #77137 preg_replace memory exhaustion
Submitted: 2018-11-10 23:31 UTC Modified: 2018-11-11 00:10 UTC
From: php at abiusx dot com Assigned:
Status: Not a bug Package: *Regular Expressions
PHP Version: 7.2.12 OS: macOS Mojave
Private report: No CVE-ID: None
 [2018-11-10 23:31 UTC] php at abiusx dot com
Description:
------------
The following code will result in memory exhaustion and very long execution times:
preg_replace(array_fill(0, 10, '/()/'),str_repeat("&", 10),NULL);

https://3v4l.org/LnV37


Either add an example to the docs, or explain that pattern/replacement should not come from user input.

Test script:
---------------
<?php
preg_replace(array_fill(0, 10, '/()/'),str_repeat("&", 10),NULL);



Patches

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2018-11-10 23:38 UTC] spam2 at rhsoft dot net
> or explain that pattern/replacement should not come from user input

breaking news: until you prove the opposite user inout is always bad
 [2018-11-10 23:44 UTC] php at abiusx dot com
preg_replace and similar functions are regularly use to parse, sanitize and validate user input.

I admit in this case, the pattern also needs to be something very specific, but the replacement string is the major factor here, which can come from the user in many scenarios.

I was unable to see any reference to "&" in the docs. I am assuming it's a backreference and thus the computational complexity is growing exponentially, but that'd be a very good point to make in the docs (or at least in the comments, or for future reference just in case someone googles it).
 [2018-11-10 23:55 UTC] requinix@php.net
-Status: Open +Status: Not a bug
 [2018-11-10 23:55 UTC] requinix@php.net
The problem here is you don't know what you're doing.

Here's an exercise:
1. Try your code with a different replacement character than "&".
2. Try with only one /()/ regex and note the final result.
3. Try with two regexes and note the result.
4. Try with three.
 [2018-11-11 00:04 UTC] php at abiusx dot com
Why is this community so toxic.

1. I know what I am doing.
2. This is not my code.
3. We're running a PHP anti-malware sandbox on a large (20TB) codebase and this is one of the instances that is crashing the sandbox (unexpected behavior)

That's why it's being shared. If you don't like it, don't look at it.
 [2018-11-11 00:10 UTC] requinix@php.net
This isn't a community. This is a bug tracker. There's a certain expectation that people who submit bug reports have done some investigation into their own suppositions to determine whether their problem is, in fact, with PHP or whether there is something wrong with their own code.

In this case it was the latter. And it was fairly easy to show that.
 [2018-11-11 03:32 UTC] spam2 at rhsoft dot net
if you would know what you are doing you would have tried preg_replace(array_fill(0, 10, '/()/'),str_repeat("X", 10),'');

instead you wrote "I was unable to see any reference to & in the docs. I am assuming it's a backreference and thus the computational complexity is growing exponentially" which is nonsense

the problem is your pattern and so "or explain that pattern/replacement should not come from user input" is nonsense too while that the pattern must not come from user input is pretty clear for everyone

besides that when you do a preg_replace on a empty string or NULL in live code you don't know what you are doing because the whole code could be wrapped in if(!empty($subject)) to save ressources from the start
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sun Sep 08 10:01:28 2024 UTC