php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #21728 sort() sorts are unpredictable
Submitted: 2003-01-18 10:26 UTC Modified: 2004-05-21 15:55 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:0 (0.0%)
Same OS:0 (0.0%)
From: andrey@php.net Assigned:
Status: Closed Package: Documentation problem
PHP Version: 4.4.0-dev OS: All
Private report: No CVE-ID: None
 [2003-01-18 10:26 UTC] andrey@php.net
 Today I closed bug #21444. The user has to master the type juggling to know the expected output. I think that it is good idea to add it's example as comprehensive one.
The script goes here (the explanation is after it) :
<?php
$arr1 = array("a","b","c","d","4",5,4,"true","TRUE",true);
sort($arr1);
var_dump($arr1);
?>

The output is :
array(10) {
  [0]=>
  bool(true)
  [1]=>
  int(4)
  [2]=>
  string(1) "4"
  [3]=>
  string(4) "TRUE"
  [4]=>
  string(1) "a"
  [5]=>
  string(1) "b"
  [6]=>
  string(1) "c"
  [7]=>
  string(1) "d"
  [8]=>
  string(4) "true"
  [9]=>
  int(5)
}
It may look strange - why (int)5 is after all the strings. This is
because "4" is lower than (int) 5, "4" is before "true" and "true" is
before 5. The first 2 are obvious, the third one is not. But it is ok.
It's better not to mix types in the array. If 5 is changed to "5" then
"5" goes right after "4".

Thanks

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2003-01-18 11:48 UTC] philip@php.net
How about:

<?php
$arr1 = array("a","b","4",5,4,"true","TRUE",true, false, "c");
sort($arr1);
var_dump($arr1);
?>

Which gives:
array(10) {
  [0]=>
  bool(false)
  [1]=>
  string(4) "TRUE"
  [2]=>
  string(1) "a"
  [3]=>
  string(4) "true"
  [4]=>
  bool(true)
  [5]=>
  string(1) "b"
  [6]=>
  string(1) "c"
  [7]=>
  int(4)
  [8]=>
  string(1) "4"
  [9]=>
  int(5)
}

Which is weird as "4" looks misplaced.  For example in this:
<?php
$arr1 = array("a","b","4",5,4,"true","TRUE",true, false, "c", "d");
sort($arr1);
var_dump($arr1);
?>

We get different results (all I added was "d" to the end):

array(11) {
  [0]=>
  bool(false)
  [1]=>
  string(1) "4"
  [2]=>
  string(4) "TRUE"
  [3]=>
  string(1) "a"
  [4]=>
  string(1) "b"
  [5]=>
  string(1) "c"
  [6]=>
  string(1) "d"
  [7]=>
  string(4) "true"
  [8]=>
  bool(true)
  [9]=>
  int(4)
  [10]=>
  int(5)
}

Notice the different order, is this a genuine bug?
 [2003-01-18 12:05 UTC] andrey@php.net
 As I said this is very complicated case because of the type juggling. I needed 30 minute to realize that 21444 is not a bug but a bogus (for me and Derick). I agree that the result is weird. I modified the the compare function to see what comparisons are made. All of them look ok.
On my php I have the same results on the script with "d" added at the end. A little modification changes the order of comparisons and thus the result is different. Maybe this is because the default sort type is SORT_REGULAR. If SORT_STRING is used the result is expected. I think that the case I provided is good to show the users that the results are kinda unexpected when both the array contains values from various datatypes and SORT_REGULAR is used. So if the users use such array they have to be warned of the "unexpected" results.
 [2003-01-18 12:17 UTC] philip@php.net
I swear I get different results by just adding a "d" to the end.  This should not happen.
 [2003-01-18 12:23 UTC] andrey@php.net
 Maybe it should not happen but the as I said the comparisons done are correct (extensive type juggling). Maybe SORT_REGULAR is not the way to sort (by default) but SORT_STRING.

Comments from other people are welcome :)
 [2003-01-20 11:54 UTC] m dot ford at lmu dot ac dot uk
Well, one of the problems here is that some of the array elements will take different values in an element-to-element comparison depending on the type of the other element. For example, "true" will be just that compared to another string, but 0 when compared against an integer; strings and integers are both converted to Boolean when compared to true/false (with resulting loss of significant information).

Another problem is that if you're using a non-sequential sorting algorithm (such as shellsort or quicksort), simply changing the length of the array will probably change which element is compared to which, and hence, because of the strangeness of "dual values" caused by type-juggling, the final order of the array.  (This may be even worse for an algorithm that is not guaranteed to maintain the order of equal items.)

If you take a look at the sorted versions of each array cited, you will find that all of the element-to-neighbour-element comparisons are actually valid, thus:

array("a","b","c","d","4",5,4,"true","TRUE",true) --
  true   : 4       ==>  (bool)    true   == true
  4      : "4"     ==>  (int)     4      == 4
  "4"    : "TRUE"  ==>  (string)  "4"    <  "TRUE"
  "TRUE" : "a"     ==>  (string)  "TRUE" <  "a"
  "a"    : "b"     ==>  (string)  "a"    <  "b"
  "b"    : "c"     ==>  (string)  "b"    <  "c"
  "c"    : "d"     ==>  (string)  "c"    <  "d"
  "d"    : "true"  ==>  (string)  "d"    <  "true"
  "true" : 5       ==>  (int)     0      <  5

array("a","b","4",5,4,"true","TRUE",true, false, "c") --
  false  : "TRUE"  ==>  (bool)   false  <  true
  "TRUE" : "a"     ==>  (string) "TRUE" <  "a"
  "a"    : "true"  ==>  (string) "a"    <  "true"
  "true" : true    ==>  (bool)   true   == true
  true   : "b"     ==>  (bool)   true   == true
  "b"    : "c"     ==>  (string) "b"    <  "c"
  "c"    : 4       ==>  (int)    0      == 4
  4      : "4"     ==>  (int)    4      == 4
  "4"    : 5       ==>  (int)    4      <  5

array("a","b","4",5,4,"true","TRUE",true, false, "c", "d") --
  false  : "4"     ==>  (bool)    false  <  true
  "4"    : "TRUE"  ==>  (string)  "4"    <  "TRUE"
  "TRUE" : "a"     ==>  (string)  "TRUE" <  "a"
  "a"    : "b"     ==>  (string)  "a"    <  "b"
  "b"    : "c"     ==>  (string)  "b"    <  "c"
  "c"    : "d"     ==>  (string)  "c"    <  "d"
  "d"    : "true"  ==>  (string)  "d"    <  "true"
  "true" : true    ==>  (bool)    true   == true
  true   : 4       ==>  (bool)    true   == true
  4      : 5       ==>  (int)     4      <  5

So, in each case, we have a valid sort -- just a *different* valid sort.  The prime determiners here seem to be the non-sequential order in which the individual comparisons are performed, and, as has been indicated, the automatic casting that takes place for each one.

(Incidentally, whilst putting the above together I was unable to find a definitive listing of *exactly* what automatic type-conversions take place in which contexts.  This is a definite oversight, as in contexts like the above it's important to know, for example, that comparing an int to a bool will cast the int to bool, and not the bool to int.  Perhaps this needs to become a doc problem for the inclusion of such a list or table?)

Hope this enlightens at least some souls reading this far!

Cheers!

Mike
 [2003-01-20 12:00 UTC] andrey@php.net
 There was (and is) a suspicion in me about because the sort is made by qsort algo. As you said all comparisons are valid ones but the order is unpredicatble.
Thanks for the comment.

I think we should rethink the sort() function and maybe change the default way of sorting (from SORT_REGULAR to SORT_STRING).
 [2003-02-05 02:34 UTC] philip@php.net
Reclassified as a sort() problem as this is too weird and unpredictable.  Maybe php-dev can confirm this behavior as how it should and will continue to be.  Mike, nice response but I still don't like this ;)
 [2003-02-06 05:21 UTC] mgf@php.net
I agree.  However, in order to give a wholly reliable sort order for elements of mixed type, I think the only real option is to use a method that does primary sort on type, with secondary sort on value only when types are equal.  I suppose there might just be a case for providing an additional sort type for this, but, given that it can be implemented in userland with a type-checking callback to usort(), I'm not totally convinced.

At the very least, I think we need a big fat warning in the docs about the hazards of mixed-type sorting!

Cheers!

Mike
 [2004-05-21 15:55 UTC] nlopess@php.net
This bug has been fixed in the documentation's XML sources. Since the
online and downloadable versions of the documentation need some time
to get updated, we would like to ask you to be a bit patient.

Thank you for the report, and for helping us make our documentation better.


 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Apr 25 08:01:28 2024 UTC