php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #72835 Note on min/max's page misleading about string comparisons
Submitted: 2016-08-15 03:52 UTC Modified: 2016-08-15 10:06 UTC
From: j dot david dot rothschild at gmail dot com Assigned: cmb (profile)
Status: Closed Package: *Math Functions
PHP Version: 7.0.9 OS: Windows, Linux
Private report: No CVE-ID: None
 [2016-08-15 03:52 UTC] j dot david dot rothschild at gmail dot com
Description:
------------
I'm experiencing an issue where calling max() on an array of strings produces an incorrect result.  I understand that with an array of strings, max() compares the strings alphanumerically, and picks the "highest".  But this is not always the case.

Out of curiosity I tried doing sort() on the array to see how the alphanumeric sort was playing out, and here also the results are not what you would expect.

I ran the test script below on 3v4l.org and see that with this example max() fails on every version from the latest down.  However, interestingly, sort() returns the expected result on <= 5.6.24 (but not from 7.0.0 up).


Test script:
---------------
<?php
//see https://3v4l.org/93ac5

$new_array = array(
  "8000",
  "12345",
  "6811-1",
  "7031",
  "6841"
);
var_dump(max($new_array));  // would expect "8000", but returns "7031"

sort($new_array);     // let's see how the alphanum sort is looking at this...
var_dump($new_array); // and it's wonky... why isn't "8000" at the end?

Expected result:
----------------
string(4) "8000"
array(5) {
  [0]=>
  string(5) "12345"
  [1]=>
  string(6) "6811-1"
  [2]=>
  string(4) "6841"
  [3]=>
  string(4) "7031"
  [4]=>
  string(4) "8000"
}

Actual result:
--------------
string(4) "7031"
array(5) {
  [0]=>
  string(4) "8000"
  [1]=>
  string(5) "12345"
  [2]=>
  string(6) "6811-1"
  [3]=>
  string(4) "6841"
  [4]=>
  string(4) "7031"
}

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2016-08-15 05:48 UTC] requinix@php.net
-Summary: Alphanumeric sort inconsistent +Summary: Note on min/max's page misleading about string comparisons -Type: Bug +Type: Documentation Problem -Package: *General Issues +Package: *Math Functions
 [2016-08-15 05:48 UTC] requinix@php.net
tl;dr: This is expected behavior (stems from how two numeric strings are compared as numbers) and I think the "multiple string values will be compared alphanumerically" written in the note is inaccurate.


As that note on the max() page says the function uses the standard comparison rules, and those say that a string compared with a string does a "numeric or lexical comparison". That means two numeric strings are compared as numbers, otherwise they're compared as strings.

For a demonstration, consider var_dump("0123" <=> "123"). If the values were compared as strings then it should show -1 because "0" < "1", but instead it shows 0 because they are compared as numbers and 123 == 123. On the other hand consider var_dump("678-9" <=> "12345"). Compared as numbers it would show -1 because 678 < 12345, but it actually shows 1 because "6" > "1".

Your example array is a mix of numeric and non-numeric values, and that causes problems while trying to determine the maximum value. https://3v4l.org/MteU2
1. "8000" < "12345" (compared as numbers)
2. "12345" < "6811-1" (compared as strings)
3. "6811-1" < "7031" (compared as strings)
4. "7031" > "6841" (compared as numbers)

If you changed that third value to "6811.1" then all comparisons would be numeric and the maximum value would be "12345". https://3v4l.org/4aClQ

Naturally the string/numeric comparison affects sorting too. It introduces an inconsistency: "8000" < "12345" < "6811-1" < "7031" < "8000". That plays havok with comparisons because sorting makes many more comparisons than just a simple scan through the array does. The sorting algorithm changed in PHP 7 which is why the sorted array is different, however in all versions the array is sorted (in a sense) because each consecutive pair is sorted correctly relative to each other.

So that means everything is behaving as expected. sort() offers a sort mode and SORT_STRING will sort the array as you're expecting it to <https://3v4l.org/oTZ8C>. max() doesn't have a similar flag so if you need to deal with mixed arrays then I suggest emulating it with rsort and SORT_STRING. If anything that particular "compared alphanumerically" bit is misleading and should be changed or removed, possibly replaced with a warning about the string/numeric comparison decision.  Ditto for min's documentation. sort() already has such a warning in place.
 [2016-08-15 10:04 UTC] cmb@php.net
-Assigned To: +Assigned To: cmb
 [2016-08-15 10:04 UTC] cmb@php.net
Wrt. the non-transitive behavior of comparisions, see also request
#71090.
 [2016-08-15 10:06 UTC] cmb@php.net
Automatic comment from SVN on behalf of cmb
Revision: http://svn.php.net/viewvc/?view=revision&amp;revision=339882
Log: Fix #72835: Note on min/max's page misleading about string comparisons
 [2016-08-15 10:06 UTC] cmb@php.net
-Status: Assigned +Status: Closed
 [2020-02-07 06:06 UTC] phpdocbot@php.net
Automatic comment on behalf of cmb
Revision: http://git.php.net/?p=doc/en.git;a=commit;h=6c604e6e40ef6ccb4f09056fc277fd38fd3cb91e
Log: Fix #72835: Note on min/max's page misleading about string comparisons
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Apr 19 16:01:27 2024 UTC