php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #81692 array_unique bug with mixed scalar values
Submitted: 2021-12-03 11:30 UTC Modified: 2021-12-06 07:10 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: stanislav dot eismont at gmail dot com Assigned:
Status: Verified Package: Arrays related
PHP Version: 7.4.26 OS: ubuntu 20.04
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: stanislav dot eismont at gmail dot com
New email:
PHP Version: OS:

 

 [2021-12-03 11:30 UTC] stanislav dot eismont at gmail dot com
Description:
------------
array_unique returns incorrect result with SORT_REGULAR flag.

Test script:
---------------
<?php
$array = ["red", "color" => "red", 78, 78, 12, "12", false, null, false, 0, true, 1, 0, null, 2.12, "2.12", 8, "-s"];
var_dump(array_unique($array, SORT_REGULAR));


Expected result:
----------------
Expected one "red" value printed:


array(11) {
  [0]=>
  string(3) "red"

// ...

}

Actual result:
--------------
Actually "red" printed twice:

array(11) {
  [0]=>
  string(3) "red"
  ["color"]=>
  string(3) "red"

// ...

}

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2021-12-03 12:25 UTC] cmb@php.net
-Status: Open +Status: Verified -Assigned To: +Assigned To: cmb
 [2021-12-03 12:25 UTC] cmb@php.net
Indeed, broken as of PHP 7.0.0: <https://3v4l.org/PXQZe>.
 [2021-12-03 15:13 UTC] requinix@php.net
Isn't this just a result of inconsistent comparisons during sorting? The sorted arrays don't place the two reds next to each other with SORT_REGULAR; they do if you use SORT_STRING, which also produces the expected unique array.
https://3v4l.org/JTQMr
https://3v4l.org/gPaPn
 [2021-12-03 15:30 UTC] cmb@php.net
-Type: Bug +Type: Documentation Problem -Assigned To: cmb +Assigned To:
 [2021-12-03 15:30 UTC] cmb@php.net
> Isn't this just a result of inconsistent comparisons during
> sorting?

Right, although PHP 5 produced the required results[1].

Anyhow, SORT_REGULAR has a fundamental flaw when working on
arbitrarly mixed scalar types, namely that the result is not
necessarily in monotonic order, i.e. the sort has arbitrary
results (depending on the sorting algorithm).  E.g. consider ['2',
'a', 1]; the following holds prior to PHP 8.0.0:

    '2' < 'a'
    'a' < 1
    '2' > 1

As of PHP 8.0.0, this is no longer the case for the given example
(thanks to the saner string to number comparisons), but consider
another example: ['!', '0', true]; the following holds:

    '!' < '0'
    '0' < true
    '!' == true

That still doesn't work.

So the current O(log n) algorithm of array_unique(), would need to
be changed to an O(n²) algorithm, or the sorting algorithm would
need to be changed back to what we had in PHP 5.  Neither option
is desireable, and I think we should just document this
limitation.  (And we also should fix the note in the description
section, which only applies for SORT_STRING.)

[1] <https://3v4l.org/RfFJr>
 [2021-12-03 15:47 UTC] stanislav dot eismont at gmail dot com
To me it isn't look like a doc related problem. I gave such a big array as an example for a reason. If you change this array somehow, let's say remove last element, then there will be one "red" value printed https://3v4l.org/tpjMt.
 [2021-12-03 15:58 UTC] requinix@php.net
-Summary: array_unique bug +Summary: array_unique bug with mixed scalar values
 [2021-12-03 15:58 UTC] requinix@php.net
> If you change this array somehow,

Changing the array will change the exact sequence of comparison operations that PHP performs in order to sort the array (which it needs to achieve the O(log n) efficiency that @cmb mentioned). The change may result in the two "red"s sorting consecutively, allowing array_unique to correctly deduplicate them, or it may not.

The problem is in how SORT_REGULAR works - but it's also behaving as intended. The solution is to not use SORT_REGULAR when you have arrays containing mixed scalars.
 [2021-12-06 07:10 UTC] stanislav dot eismont at gmail dot com
Hmm, okay, thanks for the explanation. Now I realize that despite the possibility to fix array_unique(), for example, by choosing a different algorithm, the problem will still exist in sort() with the SORT_REGULAR flag. Unfortunately, I don't know what to do about it.
 [2021-12-31 01:00 UTC] a at b dot c dot de
Maybe I'm bikeshedding, but could the current more performant algorithm be attempted, but set a flag if a problematic potentially-incomparable mix of types is encountered. Then if the flag ends up set, take the "partially-uniquified" array that has already been built (don't want to throw away the work that has already been done) and make a second pass with a slower algorithm that doesn't rely on sort order making sense.

That would at least avoid bad performance on homogeneously-typed arrays.

(I thought of sort|uniq each type separately and then combine the results, but now I'm overthinking.)
 
PHP Copyright © 2001-2022 The PHP Group
All rights reserved.
Last updated: Tue Jan 18 20:03:14 2022 UTC