php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #45184 algorithm to limit random numbers to a certain range is flawed
Submitted: 2008-06-05 10:22 UTC Modified: 2017-07-28 22:45 UTC
Votes:7
Avg. Score:4.3 ± 1.4
Reproduced:5 of 7 (71.4%)
Same Version:2 (40.0%)
Same OS:1 (20.0%)
From: kala at sankya dot com Assigned: leigh (profile)
Status: Closed Package: Math related
PHP Version: 5.6.11 OS: *
Private report: No CVE-ID: None
 [2008-06-05 10:22 UTC] kala at sankya dot com
Description:
------------
In the implementation of functions rand(min,max) and mt_rand(min,max), the algorithm used to reduce the number to the requisite range is (min + unscaled number*(max-min+1)/(RAND_MAX+1)). This is an inherently flawed implementation since it will result in some numbers occurring more frequently than others when the range of the unscaled numbers (RAND_MAX+1) is not an exact multiple of the requested range (max-min+1). The author seems to be aware of the standard problem associated with using a modulus function (as per the comments in the source code). But this implementation also suffers from exactly the same problem, the only difference is that numbers that can occur more frequently are different from the ones (the lower portions of the range) if one takes a simple modulus. See the C program given below for a demonstration of the same.

The correct way to implement this is to find a maximum value that is  an exact multiple of the requested range, and when drawing an unscaled number discard numbers above this range and draw a new number instead. This way, the pigeonhole principle that results in this "modulo bias" will be eliminated. Check out the URL http://java.sun.com/javase/6/docs/api/java/util/Random.html#nextInt(int) for a description of this method (as implemented correctly by Java).



Reproduce code:
---------------
/* This is a C program that demonstrates the problem with the PHP implementation of random number scaling - The size of the unscaled random function has been reduced to 16 bits so that the program runs in reasonable time. Assuming that the raw rand() function produces every number in the range with equal probability, this program counts the number of occurrences of a scaled call with range of 10 (i.e rand(0,9); ) when each number in the unscaled range occurs exactly once. 
*/

#include <stdio.h>
#include <string.h>

/* Test numbers in the range 0 to 9 - so max-min+1 = 10  */
#define REQUESTED_RANGE 10

/* This should really be 31, but would take too long to run, so for demonstration purposes use 16 bits */
#define UNSCALED_BITS 16

int counts[REQUESTED_RANGE];
unsigned long long MAX_RANGE = 1L << UNSCALED_BITS; // Same as RAND_MAX+1

void print_results(char *msg)
{
	int i;
	printf(msg);
	for (i=0; i < REQUESTED_RANGE; i++)
	{
		printf("%d ",counts[i]);
	}
	printf("\n");	
}

int main()
{
	unsigned int i;
	double divisor = MAX_RANGE;
	// Modulo method
	memset(counts,0,sizeof(counts));
	for (i=0; i < MAX_RANGE; i++)
	{
		int val = i % REQUESTED_RANGE;
		counts[val]++;
	}
	print_results("Output using Modulo : ");
	memset(counts,0,sizeof(counts));
	for (i=0; i < MAX_RANGE; i++)
	{
		int val = (i/divisor)*10;
		counts[val]++;
	}
	print_results("Output with scaling : ");
}


Expected result:
----------------
Results of this run :
Output using Modulo : 6554 6554 6554 6554 6554 6554 6553 6553 6553 6553 
Output with scaling : 6554 6554 6553 6554 6553 6554 6554 6553 6554 6553 

As can be noticed, the PHP implementation (line 2) results in exactly the same problem as the simple modulus function (line 1) , the only difference being that the numbers that have higher frequency have changed. (0,1,3,5,6,8 instead of 0,1,2,3,4,5).



Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2008-06-05 10:37 UTC] pajoye@php.net
Have you tried to use mt_rand instead?

Rand is known for having "issues".
 [2008-06-05 11:30 UTC] kala at sankya dot com
@pajoye : Yes I'm aware of the limitations of rand() (and LCG generators in general), and have used MT extensively. The problem here isn't with the RNG but with the algorithm used in converting the raw 31 bit random number to one within a given range. This algorithm is common to both the rand() and mt_rand() functions that take min/max arguments (both uses the RANGE_RANDOM macro defined in the file "ext/standard/php_rand.h") . The only work around is to use the no argument variant to get a 31 bit number and and do the scaling yourself. (But obviously the better solution would be to fix the code itself :) )
 [2008-06-05 11:49 UTC] kala at sankya dot com
Sorry the line 

    int val = (i/divisor)*10;

in the example program should be :

    int val = (i/divisor)*REQUESTED_RANGE;

for the program to work correctly for any other range other than 10.
 [2008-06-18 16:21 UTC] pajoye@php.net
Would you be interested in improving our code?
 [2010-05-21 12:14 UTC] mike@php.net
-Status: Assigned +Status: Feedback
 [2010-05-21 12:14 UTC] mike@php.net
Should probably be in feedback state.
 [2013-02-18 00:33 UTC] php-bugs at lists dot php dot net
No feedback was provided. The bug is being suspended because
we assume that you are no longer experiencing the problem.
If this is not the case and you are able to provide the
information that was requested earlier, please do so and
change the status of the bug back to "Open". Thank you.
 [2015-08-05 13:23 UTC] cmb@php.net
-Status: No Feedback +Status: Open -Operating System: Linux +Operating System: * -PHP Version: 5.2.6 +PHP Version: 5.6.11
 [2015-08-05 13:23 UTC] cmb@php.net
Indeed, as pointed out by the reporter, RANGE_RANDOM() is flawed.
The new CSPRNG function random_int() seems to get the scaling
right[1].

[1] <https://github.com/php/php-src/blob/php-7.0.0beta2/ext/standard/random.c#L184-L197>
 [2017-07-28 22:19 UTC] kala at sankya dot com
Since this bug hasn't been addressed even as of PHP version 7, looks very unlikely that it will ever be fixed. Since the only way out is to implement your own function for unbiased scaling, here is a replacement function that performs unbiased scaling :

function mt_rand_x($min,$max)
{
    $range = $max - $min + 1;
    $mt_max = mt_getrandmax()+1;
    $limit = $mt_max-($mt_max % $range);

    do
    {
        $rnd = mt_rand();
    } while ($rnd >= $limit);

    return ($rnd % $range)+$min;
}

Note : This is the replacement for the mt_rand($min,$max) function. For implementing an unbiased replacement for the rand($min,$max) function, change the mt_rand() calls to rand() and the mt_getrandmax() call to getrandmax().
 [2017-07-28 22:24 UTC] nikic@php.net
-Status: Assigned +Status: Closed
 [2017-07-28 22:24 UTC] nikic@php.net
This has been fixed in PHP 7.1.
 [2017-07-28 22:45 UTC] requinix@php.net
-Assigned To: pajoye +Assigned To: leigh
 
PHP Copyright © 2001-2019 The PHP Group
All rights reserved.
Last updated: Sun Dec 15 10:01:25 2019 UTC