go to bug id or search bugs for
As reported here: https://bugs.debian.org/800564 by brian m. carlson:
Applies to all PHP versions.
PHP uses the DJB "times 33" hash to hash strings in its hash tables,
without the use of any secret key. Hash values are therefore the same
between multiple invocations. As a result, it's trivial to precompute a
set of values that all hash to the same bucket and cause positively
If a script accepts untrusted hash keys, such as from JSON input, it is
subject to a DoS attack. PHP implemented the max_input_vars option, but
this is not effective in the general case, especially in the era of
JSON-laden POST requests. Perl, Python, and Ruby have all addressed
their CVEs properly, but PHP has not and as a result is still
Cloning my example repository and running
"php scripts/exploited.php < example/1048576.json" demonstrates the
problem very quickly. The similar Perl and Python scripts are not
vulnerable to this attack. A JSON file containing only 65536 entries
takes PHP 5.6 22 seconds to process.
A new CVE should probably be allocated and the bug should be fixed
correctly this time, probably by seeding a key from /dev/urandom and
using SipHash-2-4 or the like.
Python had CVE-2012-1150 and CVE-2013-7040. Ruby had CVE-2011-4815. I
can't find a CVE for Perl's 2003 fix, if one exists. The fix, which
went into 5.8, was incomplete and was addressed by CVE-2013-1667.
Add a Patch
Add a Pull Request
Note: there already exists a PHP extension for SipHash . Unfortunely of unknown licence, but it is based on public domain version , which also may be used as a quick fix. Both of them linked from Aumasson's website.
The main problem with switching to SipHash is that hashtables in PHP very often use integer keys. While we could probably start using SipHash for string keys (we need to test the performance impact here), I don't think it is realistic for integer keys. I don't know if there's some alternative, very cheap cryptographic hash for that case.
The simplest "solution" to this problem I can think of is to keep track of collisions and bail out if some function of the number of collisions and hashtable elements becomes too large. It would likely be enough to count collisions only during rehashes and check them at the end of the operation, so there should be no performance concerns here.
Looks like they has a same problem in Rust language and the solution was to use FNV hashing: https://github.com/rust-lang/rust/issues/10586
Perhaps that would a way to go for PHP as well?
Further investigation shows that at least in Python it was shown that FNV hashing is not enough to protect against collision attacks and Python has converted everything to SipHash-2-4: https://www.python.org/dev/peps/pep-0456/
Here's a branch for testing siphash (doesn't use a key, just perf experiment): https://github.com/php/php-src/compare/master...nikic:siphash
From a few quick tests, SipHash doesn't seem to have significant impact on our standard benchmarks (though I didn't do instruction counts). On a synthetic benchmark (inserting English dict in a hashtable) I got something like 10% slowdown.
So for string hashes, this is an option. The integer hash problem stays though...
Also, the idea about counting collisions during rehash won't work either. It won't be able to distinguish between few collisions on many elements and many collisions on few elements. So for size=8 both keys 0 1 2 3 8 9 10 11 and 0 1 2 3 8 16 24 would result in 4 collisions, but only the latter kind would be problematic (for larger sizes of course). We'd have to do per-hash-slot counts, which would no longer be cheap.
Here is an implementation for collision counting: https://github.com/php/php-src/compare/master...nikic:collisionCount
Collisions are counted during insert operations, if they exceed a certain level, you get a fatal error. From some initial testing, this doesn't seem to have significant perf impact.
One problem is that this approach doesn't work with add_new, so I had to rewrite some code to move away from it.
One way to potentially make the integer hashing feasible, is to introduce an intermediary step between packed hashes and unpacked ones:
Packed hashes only work if the numeric array is both dense and sorted ascending by key. We could add another level that works for dense numeric arrays in general, but does not require a certain key order. In this case the data array would contain the elements in arbitrary order and the hash array would map key to order. With that extra level, we'd only start hashing keys for sparse numeric arrays.
Allowing integer hashes at all will require us to switch to a key union. This may negatively affect array iteration.
Just looked at the HHVM implementation. Looks like they are also doing collision counting during find-phase of insert operations: http://lxr.php.net/xref/OTHER_IMPLEMENT/hiphop-2.0/hphp/runtime/base/array/hphp_array.cpp#555
It took a while, but here's an implementation of using siphash for everything (including integers): https://github.com/php/php-src/compare/master...nikic:integerHash
Just got around to doing some microbenchmarks. For switching everything to siphash (incl integer hashes):
Packed int insert: 16.9 mus +/- 0.3 mus 17.1 mus +/- 0.1 mus -1.0%
Reverse int insert: 49.0 mus +/- 0.4 mus 17.3 mus +/- 0.2 mus +182.7%
Reverse int insert x16 collisions: 54.0 mus +/- 1.1 mus 43.3 mus +/- 0.2 mus +24.6%
Incrementing string insert: 62.6 mus +/- 0.7 mus 44.8 mus +/- 0.3 mus +39.6%
This means unpacked integer hashtable is about 3x slower. The break-even point with the previous implementation is somewhere between 16x and 32x collisions per bucket. For very short strings w/o hash cache we have 40% slowdown.
It's unlikely that this will be deemed acceptable even with more optimization work (like the semi-packed variant mentioned above). So we should probably go with the collision counting during insert.
Related To: Bug #72726