php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #73427 Unintended behaviour of intersection functions
Submitted: 2016-10-31 08:14 UTC Modified: 2018-07-18 05:58 UTC
Votes:1
Avg. Score:3.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: turabgarip at gmail dot com Assigned:
Status: Not a bug Package: Arrays related
PHP Version: 5.6.27 OS: Windows
Private report: No CVE-ID: None
 [2016-10-31 08:14 UTC] turabgarip at gmail dot com
Description:
------------
Intersection functions intend to compute the intersection of two or more arrays; but they try to compare the elements of the individual array itself. This is contradictory to the logic of intersection. Thus it makes the intersection functions considerably slow.

In the test script, there are two arrays. Intersection will start to compare from the first three colors like; "blue" to "green", "black" to "blue"; that are, the elements of the first array only; which are actually very unnecessary to compare. Because the actual intention is to compute the intersection of the two arrays; not the intersection of the elements of the array itself. However an array can contain only one identical element at once anyway, so it's useless and resource consuming to check the array's own elements against each other.

Test script:
---------------
$arr1 = array('green', 'blue', 'black');
$arr2 = array('yellow', 'red', 'cyan');

array_intersect($arr1, $arr2);

Expected result:
----------------
Intersection function should compare the elements of the given arrays crosswise.


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2018-05-21 06:24 UTC] turabgarip at gmail dot com
Wow, it's been years and no one actually took a glance on it? :)
 [2018-05-21 06:46 UTC] requinix@php.net
-Status: Open +Status: Feedback
 [2018-05-21 06:46 UTC] requinix@php.net
I don't see the "compare from the first three colors" behavior. Where/how are you seeing that?
And I assume this also applies to PHP 7.1 and 7.2?
 [2018-06-24 04:25 UTC] php-bugs at lists dot php dot net
No feedback was provided. The bug is being suspended because
we assume that you are no longer experiencing the problem.
If this is not the case and you are able to provide the
information that was requested earlier, please do so and
change the status of the bug back to "Re-Opened". Thank you.
 [2018-07-17 18:10 UTC] turabgarip at gmail dot com
To test the bug, you can write a callback to print the compared elements to use with array_uintersect() like;

$arr1 = array('green', 'blue', 'cyan');
$arr2 = array('yellow', 'red', 'cyan');

$i = 1;
function print_elements($e1, $e2) {
	global $i;
	echo $i . '. ' . $e1 . ' compared to ' . $e2 . '<br>';
	$i++;
	if ($e1 == $e2) return 0;
	else return 1;
}

print_r(array_uintersect($arr1, $arr2, 'print_elements'));

The result will be;

1. green compared to blue
2. cyan compared to blue
3. green compared to cyan
4. yellow compared to red
5. cyan compared to red
6. yellow compared to cyan
7. blue compared to red
8. blue compared to cyan
9. blue compared to yellow

Array
(
)


You see that print_elements() is called for 9 times and green compares to blue; which are the elements of the first array only. And there are also missing comparisons; as you can see cyan in $arr1 didn't compare to cyan in $arr2 and thus the intersection returned an empty array. If you increase the elements in arrays; you will see that print_element() will be called for interesting times; which is obviously not the combination calculation of the arrays.

This applies to all versions as far as I can see. I tested with PHP 5.6.x and PHP 7.1

By the way I lost the password for changing the bug status; would you help on that?
 [2018-07-18 05:58 UTC] requinix@php.net
-Status: No Feedback +Status: Not a bug
 [2018-07-18 05:58 UTC] requinix@php.net
Not sure if Rasmus is still working on the host changes for bugs.php.net...

1. Your comparison function is faulty. It will incorrectly claim that two unequal values are greater than each other, and contradictions like that result in undefined behavior. It must be fixed for anything here to have meaning.
As of PHP 7 you can use the <=> operator like
  return $e1 <=> $e2;

2. The comparison function is used to sort the arrays. This is a matter of efficiency: sorting both arrays and comparing their elements is an O(n log n) algorithm while simply comparing pairs of elements is O(n^2). The output comes in three parts: sorting the first array (1-3), sorting the second array (4-6), and the intersection work (7-9).

Fixing the comparison results in https://3v4l.org/SgodV (which has a couple other alterations)
* 1-3 sorts the first array
* 4-5 sorts the second array
* 6-10 performs the intersection

If you look carefully you might spot two unexpected comparisons:
* 8 (cyan1 <=> cyan2) is repeated
* 9 (cyan1 <=> green) is comparing two elements in the first array

Both have the same explanation. After comparing two elements and finding equality, the algorithm then advances the internal pointer for the first sorted array until it finds a new element to compare - in case there were multiple cyan values. The first cyan/cyan comparison (comparison 7) was done normally and the second (8) was done during the advancement phase; it is redundant, yes, but removing this could require a fair bit of refactoring that is arguably not worth the effort and added complexity. The next comparison (9) is the algorithm looking to find the next value in the first array to use, after which it immediately (10) resumes the intersection work.

Make sense?
 
PHP Copyright © 2001-2020 The PHP Group
All rights reserved.
Last updated: Thu Apr 09 18:01:24 2020 UTC