php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #75559 array_unique() with SORT_REGULAR misbehaves with array of objects
Submitted: 2017-11-23 12:11 UTC Modified: 2017-11-23 16:35 UTC
Votes:4
Avg. Score:3.8 ± 0.8
Reproduced:3 of 3 (100.0%)
Same Version:0 (0.0%)
Same OS:1 (33.3%)
From: geompse at gmail dot com Assigned:
Status: Open Package: Arrays related
PHP Version: 7.2.0RC6 OS: Debian
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: geompse at gmail dot com
New email:
PHP Version: OS:

 

 [2017-11-23 12:11 UTC] geompse at gmail dot com
Description:
------------
According to the documentation, array_unique() does the following:
    (string) $elem1 === (string) $elem2
When passed the SORT_REGULAR flag, the comparaison is done without changing types:
    $elem1 === $elem2

In a particuliar case maybe related to the length of the array, the unicity is not done, and duplicate objects subsists.

In the test script below, you can see that $a !== $b because $a is an instance of class A, and $b is an instance of class B
When adding multiple times the same object ($a) in an array, the array_unique/SORT_REGULAR does function properly with only one occurence kept.
When adding a different object ($b) the array_unique/SORT_REGULAR does not function properly, returning an array containing duplicates.

According to 3v4l.org the issue is reproductible in almost every versions of PHP ranging between 5.2.9 - 5.6.30 and 7.0.0 - 7.2.0rc6:
https://3v4l.org/fYtvR (only versions 5.6 & 7)
https://3v4l.org/sE03m (most versions)

On my side I will deprecate array_unique by adding it to the "disable_functions" directive, however I would really like to see this issue resolved.
Thank you.

Test script:
---------------
<?php

class A {};
$a = new A();

class B {};
$b = new B();

$array = array();
for($i=0; $i<28; $i++)
    $array[$i] = $a;
$array[0] = $array[26] = $b;

# var_dump($array);
$array = array_unique($array,SORT_REGULAR);
var_dump($array);

Expected result:
----------------
array(5) {
  [0]=>
  object(B)#2 (0) {
  }
  [1]=>
  object(A)#1 (0) {
  }
}

Actual result:
--------------
array(5) {
  [0]=>
  object(B)#2 (0) {
  }
  [1]=>
  object(A)#1 (0) {
  }
  [2]=>
  object(A)#1 (0) {
  }
  [13]=>
  object(A)#1 (0) {
  }
  [26]=>
  object(B)#2 (0) {
  }
}

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2017-11-23 15:12 UTC] requinix@php.net
-Type: Bug +Type: Documentation Problem
 [2017-11-23 15:12 UTC] requinix@php.net
Disabling use of the function because it doesn't work the way you expect in one particular case seems just a bit... I don't know... ridiculous? But it's your codebase so have fun.

If it wasn't already apparent, behind the scenes array_unique() sorts [a copy of] the array. That's how it's able to find duplicates in less than the O(n^2) time of the obvious but naive algorithm. Thing is, objects in PHP are not relatively comparable, meaning $a < $b and $a > $b are both false. This creates inconsistencies when it comes to sorting and the array basically gets shuffled - just like if you used usort() with a bad comparison function.

Scanning the sorted array for duplicates is a matter of comparing adjacent elements, but since the $a and $b objects aren't all next to each other it occasionally seems that there are new "unique" values.

I don't see room for improvement in the function: I don't think we can suddenly make objects comparable (and even if how would they work?) and sorting is the best general-purpose solution for finding uniques. So I think we should warn in the docs that using array_uniques() on an array of objects will only really work well with SORT_STRING/SORT_LOCALE_STRING and after making sure to implement __toString() in all the relevant classes.

Maybe we add a array_uunique() for a user-defined sorting comparison callback?
 [2017-11-23 15:36 UTC] geompse at gmail dot com
Our PHP application makes an heavy usage of array_unique/SORT_REGULAR with objects, and it works well most of the time. I believe this is relevant when using lots of objects, for example with a framework like Symfony.
When it occurs the bug looks random and was quite hard to reproduce.

I know why array_unique does a sorting, but I don't understand why there would sometimes be inconsistencies. Wouldn't there be inconsistencies all the time ? 

Using __toString is a nice alternative to stopping using array_unique.

To me adding a new constant SORT_REFERENCE would be optimal, since in our use cases we don't clone objects. It might even be faster than sorting by string, since there would be no access to the inner data of the ZVAL structure, only to its memory pointer.

I am very satisfied with your proposition of updating the documentation, even if I would appreciate a more detailled explanation of the inconsistencies.

A array_uunique may not be very usefull, it is very simple to create an associative array key=>object then to return an array_values of it, but maybe it would be a bit faster or more memory-friendly.
 [2017-11-23 16:35 UTC] requinix@php.net
> Wouldn't there be inconsistencies all the time ?
The inconsistency I'm talking about is when a comparison function contradicts itself. For example, if it claims $a < $b and $b < $c but then says $a > $c when the first two results clearly imply $a < $c. If that happens during sorting then the resulting array is undefined.

Demonstration: https://3v4l.org/WiTC0
Two things to note:
1. The final array isn't sorted in reverse order, as returning 1 from the comparison function might suggest it would.
2. Arrays of similar size are generally sorted in the same [incorrect] way, however there are occasional length thresholds where the ordering changes.

Whether issues like this happen "all the time" is hard to say, but I do know that at least 4 out of 5 bugs reported here involving custom sorting functions are NAB due to undefined behavior because of a bad comparison function.
 
PHP Copyright © 2001-2019 The PHP Group
All rights reserved.
Last updated: Sat Sep 21 15:01:27 2019 UTC