php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #74851 uniqid() without more_entropy performs badly
Submitted: 2017-07-04 03:19 UTC Modified: 2017-07-19 17:53 UTC
From: manu at netbsd dot org Assigned:
Status: Closed Package: Performance problem
PHP Version: 7.1.6 OS: NetBSD
Private report: No CVE-ID: None
 [2017-07-04 03:19 UTC] manu at netbsd dot org
Description:
------------
uniqid() uses system microsecond-precise clock to produce an unique identifier. In order to avoid producing the same value, it will always call usleep() to wait for the next microsecond.

The problem here is that usleep() may wait for much longer. According to Opengroup's POSIX, "The suspension time may be longer than requested due to the scheduling of other activity by the system". Indeed tests on NetBSD show that the kernel will schedule another process during the usleep() call, resulting in a typical uniqid() duration around 16 ms, which is around 16000 time slower than intended. 

I suggest to address the problem by using uuidgen() system call if available. This system call is not in POSIX standard, but will be available on most modern systems. The call costs around a microsecond, and it produce a much better unique identifier than what uniqid() currently does. Attached is a patch against PHP 7.1.6 that cause PHP's configure script to look up uuidgen(), and that uses it in uniqid() if it is available. 

The same problem was reported at lease in bugs #65626, #37106, #37840 and #14248, but with no satisfying fix proposed, in my opinion.

Test script:
---------------
Test to compare uniqid() performance across systems:
time php -r 'for ($i = 0; $i < 1000; $i++) uniqid();'



Patches

0001-uniqid-performance-fix.patch (last revision 2017-07-19 02:40 UTC by manu at netbsd dot org)
patch3-ext_standard_uniqid.c (last revision 2017-07-08 14:47 UTC by manu at netbsd dot org)
patch2-ext_standard_uniqid.c (last revision 2017-07-05 15:49 UTC by manu at netbsd dot org)
patch-ext_standard_uniqid.c (last revision 2017-07-04 03:20 UTC by manu at netbsd dot org)

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2017-07-04 21:38 UTC] cmb@php.net
Note, that there is already a related RFC[1] including a
discussion on internals[2].

[1] <https://wiki.php.net/rfc/uniqid>
[2] <http://marc.info/?t=147339817200002&r=1&w=2>
 [2017-07-04 23:01 UTC] manu at netbsd dot org
The internals discussion ends with the idea of returning an UUID, which is exactly what the provided patch in this bug report does.
 [2017-07-04 23:25 UTC] nikic@php.net
This has been discussed a couple of times already and the argument is basically always the same: We're don't want to change the uniqid output format for BC reasons. In particular changing the length is problematic, as uniqid output is stored in length-limited database fields (for example). Switching uniqid to use UUIDs would certainly cause breakage.

There is an active proposal to expose a UUID interface (without changing the uniqid function): https://wiki.php.net/rfc/uuid

I think what we should do here is deprecate uniqid() in favor of bin2hex(random_bytes(16)), or the proposed UUID interface.
 [2017-07-05 02:11 UTC] manu at netbsd dot org
Right, we have a double constraint: do not change output format for backward compatibility, but also change it to increase performance and uniqueness.

Some proposed to deprecate uniqid(), but this does not address the backward compatibility ptoblem, quite the contrary.

What about adding an uniqid.version option to php.ini to govern uniqid() behavior? Default would uniqid.version=0 for backward compatible behavior, and uniqid.version=1 would enable UUID output.
 [2017-07-05 15:51 UTC] manu at netbsd dot org
I updated the patch to just extract the nanosecond timestamp of UUID so that we do not introduce non digits in the output. That might have confused some script that do not expect it.

I also avoid goind the UUID way if more_entropy is set.
 [2017-07-08 14:56 UTC] manu at netbsd dot org
I just uploaded a third patch that improves the fix to the problem:
- uinqid() semantics is not altered: it still output a 14 digit unique identifier
- performance boost is obtained for all supported platforms, no uuidgen() support required.
- Cygwin requirement to use more_entropy=true can be removed

The idea is to poll time using gettimeofday() until the microsecond changes. On a modern system, that will cause a a few gettimeofday() system calls that last much shorter that usleep(1).

I measured a 10000-fold performance improvement on NetBSD, and I expect at least a 10-fold improvement on Linux.
 [2017-07-18 21:18 UTC] nikic@php.net
This looks like a good approach. I tested your patch on Linux against a tight uniqid() loop and got an improvement of approximately 1000x.

One technical note: As written the code is not thread safe. The prev_tv variable should be declared as a thread-local variable. To avoid messing with TSRM globals, it can be declared as ZEND_TLS.
 [2017-07-19 02:42 UTC] manu at netbsd dot org
I uploaded an updated patch against git, using ZEND_TLS.
 [2017-07-19 17:53 UTC] nikic@php.net
-Summary: uniqid performances +Summary: uniqid() without more_entropy performs badly
 [2017-07-19 17:55 UTC] nikic@php.net
Automatic comment on behalf of manu@netbsd.org
Revision: http://git.php.net/?p=php-src.git;a=commit;h=d25049cc1b74ae445d6521997a421a7462f1ad5b
Log: Fixed bug #74851: Improve uniqid() performance
 [2017-07-19 17:55 UTC] nikic@php.net
-Status: Open +Status: Closed
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 21 13:01:31 2024 UTC