php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #48224 array_rand() broken sorting semantics
Submitted: 2009-05-10 15:07 UTC Modified: 2009-05-10 16:46 UTC
From: theultramage at gmail dot com Assigned: colder
Status: Closed Package: Arrays related
PHP Version: 5.*, 6CVS (2009-05-09) OS: *
Private report: No CVE-ID:
 [2009-05-10 15:07 UTC] theultramage at gmail dot com
Description:
------------
The code of PHP_FUNCTION(array_rand) contains a conditional call to array_data_shuffle ('shuffle()' function). It was added in http://cvs.php.net/viewvc.cgi/php-src/ext/standard/array.c?r1=1.79&r2=1.80& to resolve bug http://bugs.php.net/bug.php?id=7492 . There are several mistakes here:

1. The developer used the comparison (num_req_val == num_avail). However, these are not constants and change a lot during the execution of the algorithm. While it is true that the check holds true in the case where the user requests all elements in the array, it also holds true in the general case where both of these variables reach the value '0' at the end of the loop, which is 10+% of the time and occurs randomly. This causes array_rand to produce sequences that are "in ascending order, but sometimes completely random". See http://php.net/manual/en/function.array-rand.php#74406.

2. The developer should have never made this change in the first place. The user's complaint was "If I ask array_rand() to pick 6 elements out of 6, the result isn't randomized". But here the user failed to realize that it's not array_rand's job to produce the results randomly ordered. It's to "pick K random entries out of N". Therefore the change 1.80 was out of scope of what array_rand was intended for. The developer should have instead instructed the user to use shuffle() on the result if he required a shuffled result.

My proposal: Remove the two lines that do the shuffling in array_rand's code, since they're broken and don't make sense even if they were fixed. It's not the function's task to randomize the output's order - that should be done by the user if he needs to do so.

PS: A more detailed analysis can be found at http://netvor.sk/~umage/wtf/www/php/array_rand.html .

Reproduce code:
---------------
<?php
	// also see http://php.net/manual/en/function.array-rand.php#74406

	// test case for '3 out of 10'... try this and then try '8 out of 10'
	$n = 20;
	$a = array(0,1,2,3,4,5,6,7,8,9);

	for( $i = 0; $i < $n; $i++ )
	{
	  $keys = array_rand($a,3);
	  printf("%d %d %d\n", $keys[0], $keys[1], $keys[2]);
	}
?>

Expected result:
----------------
Individual sequences should appear in ascending order, since that's how the original code generated them.

Actual result:
--------------
The sequences are mostly sorted, sometimes shuffled. This occurs randomly on each run. The percentage of shuffled sequences increases as the requested number nears the number of entries in the array.

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-05-10 16:30 UTC] jani@php.net
Assigning to Andrei who did that commit you say was wrong. Let him 
decide what to do with this. :)
 [2009-05-10 16:46 UTC] colder@php.net
This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.

I removed the shuffle altogether, it seems to be the only way for it to make sense.
 
PHP Copyright © 2001-2014 The PHP Group
All rights reserved.
Last updated: Thu Apr 24 19:01:53 2014 UTC