php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #72655 How to speed up hashing between 40% and 50%
Submitted: 2016-07-22 22:50 UTC Modified: -
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: simonhf at gmail dot com Assigned:
Status: Open Package: Strings related
PHP Version: 7.0.9 OS: Irrelevant
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If this is not your bug, you can add a comment by following this link.
If this is your bug, but you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: simonhf at gmail dot com
New email:
PHP Version: OS:

 

 [2016-07-22 22:50 UTC] simonhf at gmail dot com
Description:
------------
I saw that the hash function is a relatively old byte by byte hash function. Newer hash functions like murmurhash3 work on multiple bytes at a time and are therefore faster for bigger strings hashed.

I instrumented the existing hash function to capture each string hashed. When running php -v then about 8000 strings were captured. They were all under 39 bytes long.

When racing the existing hash function against mmh3 then they are about the same, and the only advantage is that mmh3 would go faster if the strings were longer... which they are not.

However, if I pad out the 8000 real world strings to the next 16 bytes and re-race then mmh3 is about 40% to 50% faster than the existing hash algorithm.

Why does mmh3 do better this time? Because it works much faster on blocks of 16 bytes. It processes any tail of up to 15 bytes, byte by byte still. By padding out the 8000 real world strings to the next 16 bytes then mmh3 never had to run its byte by byte tail code and is therefore much faster.

This got me to thinking that if PHP internals were changed to pad all strings to the next 16 byte boundary and use mmh3 instead then wouldn't this lead to a measurable run-time performance increase in exchange for a slight run-time memory increase?

[1] https://github.com/php/php-src/blob/PHP-7.1/Zend/zend_string.h#L325

Test script:
---------------
Key,value store is a little faster, memory is a little bigger.


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2023-03-11 06:26 UTC] croverwnorene8 at googlemail dot com
Thanks for the information..   https://www.umrproviderportal.org/github.com
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Apr 19 23:01:28 2024 UTC