|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #70644 trivial hash complexity DoS attack
Submitted: 2015-10-05 15:34 UTC Modified: 2015-10-10 16:14 UTC
Avg. Score:3.9 ± 1.4
Reproduced:5 of 7 (71.4%)
Same Version:4 (80.0%)
Same OS:3 (60.0%)
From: Assigned:
Status: Open Package: Scripting Engine problem
PHP Version: 5.6.14 OS:
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
Block user comment
Status: Assign to:
Bug Type:
New email:
PHP Version: OS:


 [2015-10-05 15:34 UTC]
As reported here: by brian m. carlson:

Applies to all PHP versions.

PHP uses the DJB "times 33" hash to hash strings in its hash tables,
without the use of any secret key.  Hash values are therefore the same
between multiple invocations.  As a result, it's trivial to precompute a
set of values that all hash to the same bucket and cause positively
abysmal performance.

If a script accepts untrusted hash keys, such as from JSON input, it is
subject to a DoS attack.  PHP implemented the max_input_vars option, but
this is not effective in the general case, especially in the era of
JSON-laden POST requests.  Perl, Python, and Ruby have all addressed
their CVEs properly, but PHP has not and as a result is still

Cloning my example repository[0] and running
"php scripts/exploited.php < example/1048576.json" demonstrates the
problem very quickly.  The similar Perl and Python scripts are not
vulnerable to this attack.  A JSON file containing only 65536 entries
takes PHP 5.6 22 seconds to process.

A new CVE should probably be allocated and the bug should be fixed
correctly this time, probably by seeding a key from /dev/urandom and
using SipHash-2-4 or the like.

Python had CVE-2012-1150 and CVE-2013-7040.  Ruby had CVE-2011-4815.  I
can't find a CVE for Perl's 2003 fix, if one exists.  The fix, which
went into 5.8, was incomplete and was addressed by CVE-2013-1667.


Test script:


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2015-10-06 07:26 UTC] phpmpan at mpan dot pl
Note: there already exists a PHP extension for SipHash [1]. Unfortunely of unknown licence, but it is based on public domain version [2], which also may be used as a quick fix. Both of them linked from Aumasson's website.

[1] <>
[2] <>
 [2015-10-06 07:39 UTC]
The main problem with switching to SipHash is that hashtables in PHP very often use integer keys. While we could probably start using SipHash for string keys (we need to test the performance impact here), I don't think it is realistic for integer keys. I don't know if there's some alternative, very cheap cryptographic hash for that case.

The simplest "solution" to this problem I can think of is to keep track of collisions and bail out if some function of the number of collisions and hashtable elements becomes too large. It would likely be enough to count collisions only during rehashes and check them at the end of the operation, so there should be no performance concerns here.
 [2015-10-06 11:26 UTC]
Looks like they has a same problem in Rust language and the solution was to use FNV hashing:

Perhaps that would a way to go for PHP as well?
 [2015-10-06 11:32 UTC]
Further investigation shows that at least in Python it was shown that FNV hashing is not enough to protect against collision attacks and Python has converted everything to SipHash-2-4:
 [2015-10-07 16:24 UTC]
Here's a branch for testing siphash (doesn't use a key, just perf experiment):

From a few quick tests, SipHash doesn't seem to have significant impact on our standard benchmarks (though I didn't do instruction counts). On a synthetic benchmark (inserting English dict in a hashtable) I got something like 10% slowdown.

So for string hashes, this is an option. The integer hash problem stays though...
 [2015-10-07 16:39 UTC]
Also, the idea about counting collisions during rehash won't work either. It won't be able to distinguish between few collisions on many elements and many collisions on few elements. So for size=8 both keys 0 1 2 3 8 9 10 11 and 0 1 2 3 8 16 24 would result in 4 collisions, but only the latter kind would be problematic (for larger sizes of course). We'd have to do per-hash-slot counts, which would no longer be cheap.
 [2015-10-08 09:45 UTC]
-Package: *Encryption and hash functions +Package: Scripting Engine problem
 [2015-10-08 09:45 UTC]
Here is an implementation for collision counting:

Collisions are counted during insert operations, if they exceed a certain level, you get a fatal error. From some initial testing, this doesn't seem to have significant perf impact.

One problem is that this approach doesn't work with add_new, so I had to rewrite some code to move away from it.
 [2015-10-08 09:56 UTC]
One way to potentially make the integer hashing feasible, is to introduce an intermediary step between packed hashes and unpacked ones:

Packed hashes only work if the numeric array is both dense and sorted ascending by key. We could add another level that works for dense numeric arrays in general, but does not require a certain key order. In this case the data array would contain the elements in arbitrary order and the hash array would map key to order. With that extra level, we'd only start hashing keys for sparse numeric arrays.

Allowing integer hashes at all will require us to switch to a key union. This may negatively affect array iteration.
 [2015-10-08 22:52 UTC]
Just looked at the HHVM implementation. Looks like they are also doing collision counting during find-phase of insert operations:

It took a while, but here's an implementation of using siphash for everything (including integers):
 [2015-10-10 16:14 UTC]
Just got around to doing some microbenchmarks. For switching everything to siphash (incl integer hashes):

                                   SipHash                Baseline
Packed int insert:                 16.9 mus +/- 0.3 mus   17.1 mus +/- 0.1 mus     -1.0%
Reverse int insert:                49.0 mus +/- 0.4 mus   17.3 mus +/- 0.2 mus   +182.7%
Reverse int insert x16 collisions: 54.0 mus +/- 1.1 mus   43.3 mus +/- 0.2 mus    +24.6%
Incrementing string insert:        62.6 mus +/- 0.7 mus   44.8 mus +/- 0.3 mus    +39.6%

This means unpacked integer hashtable is about 3x slower. The break-even point with the previous implementation is somewhere between 16x and 32x collisions per bucket. For very short strings w/o hash cache we have 40% slowdown.

It's unlikely that this will be deemed acceptable even with more optimization work (like the semi-packed variant mentioned above). So we should probably go with the collision counting during insert.
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Jun 20 10:01:31 2024 UTC