php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #70644 trivial hash complexity DoS attack
Submitted: 2015-10-05 15:34 UTC Modified: 2015-10-10 16:14 UTC
Votes:11
Avg. Score:3.9 ± 1.4
Reproduced:5 of 7 (71.4%)
Same Version:4 (80.0%)
Same OS:3 (60.0%)
From: ondrej@php.net Assigned:
Status: Open Package: Scripting Engine problem
PHP Version: 5.6.14 OS:
Private report: No CVE-ID: None
 [2015-10-05 15:34 UTC] ondrej@php.net
Description:
------------
As reported here: https://bugs.debian.org/800564 by brian m. carlson:

Applies to all PHP versions.

PHP uses the DJB "times 33" hash to hash strings in its hash tables,
without the use of any secret key.  Hash values are therefore the same
between multiple invocations.  As a result, it's trivial to precompute a
set of values that all hash to the same bucket and cause positively
abysmal performance.

If a script accepts untrusted hash keys, such as from JSON input, it is
subject to a DoS attack.  PHP implemented the max_input_vars option, but
this is not effective in the general case, especially in the era of
JSON-laden POST requests.  Perl, Python, and Ruby have all addressed
their CVEs properly, but PHP has not and as a result is still
vulnerable.

Cloning my example repository[0] and running
"php scripts/exploited.php < example/1048576.json" demonstrates the
problem very quickly.  The similar Perl and Python scripts are not
vulnerable to this attack.  A JSON file containing only 65536 entries
takes PHP 5.6 22 seconds to process.

A new CVE should probably be allocated and the bug should be fixed
correctly this time, probably by seeding a key from /dev/urandom and
using SipHash-2-4 or the like.

Python had CVE-2012-1150 and CVE-2013-7040.  Ruby had CVE-2011-4815.  I
can't find a CVE for Perl's 2003 fix, if one exists.  The fix, which
went into 5.8, was incomplete and was addressed by CVE-2013-1667.

[0] https://github.com/bk2204/php-hash-dos


Test script:
---------------
https://github.com/bk2204/php-hash-dos


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2015-10-06 07:26 UTC] phpmpan at mpan dot pl
Note: there already exists a PHP extension for SipHash [1]. Unfortunely of unknown licence, but it is based on public domain version [2], which also may be used as a quick fix. Both of them linked from Aumasson's website.

[1] <https://github.com/jedisct1/siphash-php>
[2] <https://github.com/floodyberry/siphash>
 [2015-10-06 07:39 UTC] nikic@php.net
The main problem with switching to SipHash is that hashtables in PHP very often use integer keys. While we could probably start using SipHash for string keys (we need to test the performance impact here), I don't think it is realistic for integer keys. I don't know if there's some alternative, very cheap cryptographic hash for that case.

The simplest "solution" to this problem I can think of is to keep track of collisions and bail out if some function of the number of collisions and hashtable elements becomes too large. It would likely be enough to count collisions only during rehashes and check them at the end of the operation, so there should be no performance concerns here.
 [2015-10-06 11:26 UTC] ondrej@php.net
Looks like they has a same problem in Rust language and the solution was to use FNV hashing: https://github.com/rust-lang/rust/issues/10586

https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function

Perhaps that would a way to go for PHP as well?
 [2015-10-06 11:32 UTC] ondrej@php.net
Further investigation shows that at least in Python it was shown that FNV hashing is not enough to protect against collision attacks and Python has converted everything to SipHash-2-4: https://www.python.org/dev/peps/pep-0456/
 [2015-10-07 16:24 UTC] nikic@php.net
Here's a branch for testing siphash (doesn't use a key, just perf experiment): https://github.com/php/php-src/compare/master...nikic:siphash

From a few quick tests, SipHash doesn't seem to have significant impact on our standard benchmarks (though I didn't do instruction counts). On a synthetic benchmark (inserting English dict in a hashtable) I got something like 10% slowdown.

So for string hashes, this is an option. The integer hash problem stays though...
 [2015-10-07 16:39 UTC] nikic@php.net
Also, the idea about counting collisions during rehash won't work either. It won't be able to distinguish between few collisions on many elements and many collisions on few elements. So for size=8 both keys 0 1 2 3 8 9 10 11 and 0 1 2 3 8 16 24 would result in 4 collisions, but only the latter kind would be problematic (for larger sizes of course). We'd have to do per-hash-slot counts, which would no longer be cheap.
 [2015-10-08 09:45 UTC] nikic@php.net
-Package: *Encryption and hash functions +Package: Scripting Engine problem
 [2015-10-08 09:45 UTC] nikic@php.net
Here is an implementation for collision counting: https://github.com/php/php-src/compare/master...nikic:collisionCount

Collisions are counted during insert operations, if they exceed a certain level, you get a fatal error. From some initial testing, this doesn't seem to have significant perf impact.

One problem is that this approach doesn't work with add_new, so I had to rewrite some code to move away from it.
 [2015-10-08 09:56 UTC] nikic@php.net
One way to potentially make the integer hashing feasible, is to introduce an intermediary step between packed hashes and unpacked ones:

Packed hashes only work if the numeric array is both dense and sorted ascending by key. We could add another level that works for dense numeric arrays in general, but does not require a certain key order. In this case the data array would contain the elements in arbitrary order and the hash array would map key to order. With that extra level, we'd only start hashing keys for sparse numeric arrays.

Allowing integer hashes at all will require us to switch to a key union. This may negatively affect array iteration.
 [2015-10-08 22:52 UTC] nikic@php.net
Just looked at the HHVM implementation. Looks like they are also doing collision counting during find-phase of insert operations: http://lxr.php.net/xref/OTHER_IMPLEMENT/hiphop-2.0/hphp/runtime/base/array/hphp_array.cpp#555

It took a while, but here's an implementation of using siphash for everything (including integers): https://github.com/php/php-src/compare/master...nikic:integerHash
 [2015-10-10 16:14 UTC] nikic@php.net
Just got around to doing some microbenchmarks. For switching everything to siphash (incl integer hashes):

                                   SipHash                Baseline
Packed int insert:                 16.9 mus +/- 0.3 mus   17.1 mus +/- 0.1 mus     -1.0%
Reverse int insert:                49.0 mus +/- 0.4 mus   17.3 mus +/- 0.2 mus   +182.7%
Reverse int insert x16 collisions: 54.0 mus +/- 1.1 mus   43.3 mus +/- 0.2 mus    +24.6%
Incrementing string insert:        62.6 mus +/- 0.7 mus   44.8 mus +/- 0.3 mus    +39.6%

This means unpacked integer hashtable is about 3x slower. The break-even point with the previous implementation is somewhere between 16x and 32x collisions per bucket. For very short strings w/o hash cache we have 40% slowdown.

It's unlikely that this will be deemed acceptable even with more optimization work (like the semi-packed variant mentioned above). So we should probably go with the collision counting during insert.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sun Oct 13 02:01:27 2024 UTC