php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #65061 collator_sort/asort producing incorrect results when sorting latin characters
Submitted: 2013-06-19 09:55 UTC Modified: 2021-01-07 13:26 UTC
From: jithin1987 at gmail dot com Assigned:
Status: Not a bug Package: intl (PECL)
PHP Version: 5.4.16 OS: redhat 6
Private report: No CVE-ID: None
 [2013-06-19 09:55 UTC] jithin1987 at gmail dot com
Description:
------------
When locale is not en_US and when input array's indices are not integers in order 
then collator_sort/asort is not producing the correct results.

Test script:
---------------
<?php
$coll = collator_create("ru_RU");
$array = array( "n", "", "d", "nn" );
print_r("Input \n");
print_r($array);
$res_val = collator_asort( $coll, $array);
print_r("Output\n");
print_r($array);
$array2 = array(
  '1' => 'п',
  '4' => '',
  '7' => 'd',
  '2' => 'пп' );
print_r("Input\n");
print_r($array2);
$res_val = collator_asort( $coll, $array2);
print_r("Output\n");
print_r($array2);
?>

Expected result:
----------------
I would have expected both output arrays to be sorted similarly. Precisely output 
2 from above test script to be.

Output 2
Array
(
    [4] =>
    [7] => d
    [1] => п
    [2] => пп
)

Actual result:
--------------
Output 2
Array
(
    [4] =>
    [1] => п
    [2] => пп
    [7] => d
)

On running it under GDB I found something weird happening with values being 
compared.

backtrace 

Breakpoint 3, collator_regular_compare_function (result=0x7fffffffa1a0, 
op1=0xa3a978, op2=0xa3ae50) at /home/jithine/sources/libs/icu/branches/yphp_intl-
3/source/collator/collator_sort.c:89
89                      result->type = IS_LONG;
(gdb) print (*result).value
$96 = {lval = 1, dval = 4.9406564584124654e-324, str = {val = 0x1 <Address 0x1 out 
of bounds>, len = 0}, ht = 0x1, obj = {handle = 1, handlers = 0x0}}
(gdb) print *str1
$97 = {value = {lval = 10726312, dval = 5.2995022657747129e-317, str = {val = 
0xa3aba8 "d", len = 2}, ht = 0xa3aba8, obj = {handle = 10726312, handlers = 0x2}}, 
refcount__gc = 3, type = 6 '\006',
  is_ref__gc = 0 '\000'}
(gdb) print *str2
$98 = {value = {lval = 10726960, dval = 5.299822420313218e-317, str = {val = 
0xa3ae30 "?\004", len = 2}, ht = 0xa3ae30, obj = {handle = 10726960, handlers = 
0x2}}, refcount__gc = 2, type = 6 '\006',
  is_ref__gc = 0 '\000'}
(gdb) c

Here value of *str2 should have been 'n' but its showing as "?\004" and the above 
comparison operation should have identified 'd' as lesser value, but the opposite 
is happening.


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2013-06-20 10:40 UTC] jithin1987 at gmail dot com
-Summary: php intl v3 test case failures with icu 51-2 +Summary: collator_sort/asort producing incorrect results when sorting latin characters -PHP Version: 5.3.26 +PHP Version: 5.4.16
 [2013-06-20 10:40 UTC] jithin1987 at gmail dot com
Correcting summary.
 [2013-06-21 09:48 UTC] jithin1987 at gmail dot com
I was debugging deeper and found something weirder

<?php
$coll = collator_create("ru_RU");

$array2 = array(
  '1' => 'n',
  '4' => '',
  '7' => 'd',
  '8' => 'f',
  '9' => 'e',
  '2' => 'nn' );
$res_val = collator_asort( $coll, $array2);
print_r("Output 1\n");
print_r($array2);

$array2 = array(
  '1' => 'п',
  '4' => '',
  '7' => 'd',
  '2' => 'пп' );
$res_val = collator_asort( $coll, $array2);
print_r("Output 2\n");
print_r($array2);
?>

In the above test program 1st array gets sorted correctly while second array gets 
sorted incorrectly.

On checking the value of array getting sorted inside 
collator_sort.c:collator_sort_internal() I find that once 
collator_convert_hash_from_utf8_to_utf16() is called the value on n & nn is 
getting corrupted in case of second array.


Value of nn before collator_convert_hash_from_utf8_to_utf16()

Breakpoint 3, collator_sort_internal (renumber=0, ht=2, 
return_value=0x7ffff7fbfde0, return_value_ptr=0x0, this_ptr=0x0, 
return_value_used=1)
    at /home/jithine/sources/libs/icu/branches/yphp_intl-
3/source/collator/collator_sort.c:318
318             collator_convert_hash_from_utf8_to_utf16( hash, 
COLLATOR_ERROR_CODE_P( co ) );
(gdb) print  **(zval **)(*(Bucket *)hash.arBuckets[2]).pData
$78 = {value = {lval = 140737352781048, dval = 6.9533491095755836e-310, str = {val 
= 0x7ffff7eb4cf8 "пп", len = 4}, ht = 
0x7ffff7eb4cf8, obj = {handle = 4159393016, handlers = 0x4}},
  refcount__gc = 1, type = 6 '\006', is_ref__gc = 0 '\000'}

Value of nn after collator_convert_hash_from_utf8_to_utf16()

(gdb) n
319             COLLATOR_CHECK_STATUS( co, "Error converting hash from UTF-8 to 
UTF-16" );
(gdb) print  **(zval **)(*(Bucket *)hash.arBuckets[2]).pData
$79 = {value = {lval = 140737353877760, dval = 6.9533491637603558e-310, str = {val 
= 0x7ffff7fc0900 "?\004?\004", len = 4}, ht = 
0x7ffff7fc0900, obj = {handle = 4160489728,
      handlers = 0x7fff00000004}}, refcount__gc = 1, type = 6 '\006', is_ref__gc = 
0 '\000'}

Any idea what is happening?
 [2013-06-21 09:57 UTC] jithin1987 at gmail dot com
-Status: Open +Status: Closed
 [2013-06-21 09:57 UTC] jithin1987 at gmail dot com
I realized that this is an invalid bug. I wasn't paying close attention to input 
data. Mistook п for n
 [2021-01-07 13:26 UTC] cmb@php.net
-Status: Closed +Status: Not a bug
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sun Dec 22 06:01:30 2024 UTC