|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #72655 How to speed up hashing between 40% and 50%
Submitted: 2016-07-22 22:50 UTC Modified: -
Avg. Score:5.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: simonhf at gmail dot com Assigned:
Status: Open Package: Strings related
PHP Version: 7.0.9 OS: Irrelevant
Private report: No CVE-ID: None
View Add Comment Developer Edit
Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know!
Just going to say 'Me too!'? Don't clutter the database with that please — but make sure to vote on the bug!
Your email address:
Solve the problem:
30 - 2 = ?
Subscribe to this entry?

 [2016-07-22 22:50 UTC] simonhf at gmail dot com
I saw that the hash function is a relatively old byte by byte hash function. Newer hash functions like murmurhash3 work on multiple bytes at a time and are therefore faster for bigger strings hashed.

I instrumented the existing hash function to capture each string hashed. When running php -v then about 8000 strings were captured. They were all under 39 bytes long.

When racing the existing hash function against mmh3 then they are about the same, and the only advantage is that mmh3 would go faster if the strings were longer... which they are not.

However, if I pad out the 8000 real world strings to the next 16 bytes and re-race then mmh3 is about 40% to 50% faster than the existing hash algorithm.

Why does mmh3 do better this time? Because it works much faster on blocks of 16 bytes. It processes any tail of up to 15 bytes, byte by byte still. By padding out the 8000 real world strings to the next 16 bytes then mmh3 never had to run its byte by byte tail code and is therefore much faster.

This got me to thinking that if PHP internals were changed to pad all strings to the next 16 byte boundary and use mmh3 instead then wouldn't this lead to a measurable run-time performance increase in exchange for a slight run-time memory increase?


Test script:
Key,value store is a little faster, memory is a little bigger.


Add a Patch

Pull Requests

Add a Pull Request

PHP Copyright © 2001-2021 The PHP Group
All rights reserved.
Last updated: Sun Sep 19 12:03:36 2021 UTC