php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #72655 How to speed up hashing between 40% and 50%
Submitted: 2016-07-22 22:50 UTC Modified: -
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: simonhf at gmail dot com Assigned:
Status: Open Package: Strings related
PHP Version: 7.0.9 OS: Irrelevant
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: simonhf at gmail dot com
New email:
PHP Version: OS:

 

 [2016-07-22 22:50 UTC] simonhf at gmail dot com
Description:
------------
I saw that the hash function is a relatively old byte by byte hash function. Newer hash functions like murmurhash3 work on multiple bytes at a time and are therefore faster for bigger strings hashed.

I instrumented the existing hash function to capture each string hashed. When running php -v then about 8000 strings were captured. They were all under 39 bytes long.

When racing the existing hash function against mmh3 then they are about the same, and the only advantage is that mmh3 would go faster if the strings were longer... which they are not.

However, if I pad out the 8000 real world strings to the next 16 bytes and re-race then mmh3 is about 40% to 50% faster than the existing hash algorithm.

Why does mmh3 do better this time? Because it works much faster on blocks of 16 bytes. It processes any tail of up to 15 bytes, byte by byte still. By padding out the 8000 real world strings to the next 16 bytes then mmh3 never had to run its byte by byte tail code and is therefore much faster.

This got me to thinking that if PHP internals were changed to pad all strings to the next 16 byte boundary and use mmh3 instead then wouldn't this lead to a measurable run-time performance increase in exchange for a slight run-time memory increase?

[1] https://github.com/php/php-src/blob/PHP-7.1/Zend/zend_string.h#L325

Test script:
---------------
Key,value store is a little faster, memory is a little bigger.


Patches

Add a Patch

Pull Requests

Add a Pull Request

 
PHP Copyright © 2001-2019 The PHP Group
All rights reserved.
Last updated: Sat Nov 16 02:01:36 2019 UTC