php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #65061 collator_sort/asort producing incorrect results when sorting latin characters
Submitted: 2013-06-19 09:55 UTC Modified: 2021-01-07 13:26 UTC
From: jithin1987 at gmail dot com Assigned:
Status: Not a bug Package: intl (PECL)
PHP Version: 5.4.16 OS: redhat 6
Private report: No CVE-ID: None
View Add Comment Developer Edit
Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know!
Just going to say 'Me too!'? Don't clutter the database with that please !
Your email address:
MUST BE VALID
Solve the problem:
33 + 13 = ?
Subscribe to this entry?

 
 [2013-06-19 09:55 UTC] jithin1987 at gmail dot com
Description:
------------
When locale is not en_US and when input array's indices are not integers in order 
then collator_sort/asort is not producing the correct results.

Test script:
---------------
<?php
$coll = collator_create("ru_RU");
$array = array( "n", "", "d", "nn" );
print_r("Input \n");
print_r($array);
$res_val = collator_asort( $coll, $array);
print_r("Output\n");
print_r($array);
$array2 = array(
  '1' => 'п',
  '4' => '',
  '7' => 'd',
  '2' => 'пп' );
print_r("Input\n");
print_r($array2);
$res_val = collator_asort( $coll, $array2);
print_r("Output\n");
print_r($array2);
?>

Expected result:
----------------
I would have expected both output arrays to be sorted similarly. Precisely output 
2 from above test script to be.

Output 2
Array
(
    [4] =>
    [7] => d
    [1] => п
    [2] => пп
)

Actual result:
--------------
Output 2
Array
(
    [4] =>
    [1] => п
    [2] => пп
    [7] => d
)

On running it under GDB I found something weird happening with values being 
compared.

backtrace 

Breakpoint 3, collator_regular_compare_function (result=0x7fffffffa1a0, 
op1=0xa3a978, op2=0xa3ae50) at /home/jithine/sources/libs/icu/branches/yphp_intl-
3/source/collator/collator_sort.c:89
89                      result->type = IS_LONG;
(gdb) print (*result).value
$96 = {lval = 1, dval = 4.9406564584124654e-324, str = {val = 0x1 <Address 0x1 out 
of bounds>, len = 0}, ht = 0x1, obj = {handle = 1, handlers = 0x0}}
(gdb) print *str1
$97 = {value = {lval = 10726312, dval = 5.2995022657747129e-317, str = {val = 
0xa3aba8 "d", len = 2}, ht = 0xa3aba8, obj = {handle = 10726312, handlers = 0x2}}, 
refcount__gc = 3, type = 6 '\006',
  is_ref__gc = 0 '\000'}
(gdb) print *str2
$98 = {value = {lval = 10726960, dval = 5.299822420313218e-317, str = {val = 
0xa3ae30 "?\004", len = 2}, ht = 0xa3ae30, obj = {handle = 10726960, handlers = 
0x2}}, refcount__gc = 2, type = 6 '\006',
  is_ref__gc = 0 '\000'}
(gdb) c

Here value of *str2 should have been 'n' but its showing as "?\004" and the above 
comparison operation should have identified 'd' as lesser value, but the opposite 
is happening.


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2013-06-20 10:40 UTC] jithin1987 at gmail dot com
-Summary: php intl v3 test case failures with icu 51-2 +Summary: collator_sort/asort producing incorrect results when sorting latin characters -PHP Version: 5.3.26 +PHP Version: 5.4.16
 [2013-06-20 10:40 UTC] jithin1987 at gmail dot com
Correcting summary.
 [2013-06-21 09:48 UTC] jithin1987 at gmail dot com
I was debugging deeper and found something weirder

<?php
$coll = collator_create("ru_RU");

$array2 = array(
  '1' => 'n',
  '4' => '',
  '7' => 'd',
  '8' => 'f',
  '9' => 'e',
  '2' => 'nn' );
$res_val = collator_asort( $coll, $array2);
print_r("Output 1\n");
print_r($array2);

$array2 = array(
  '1' => 'п',
  '4' => '',
  '7' => 'd',
  '2' => 'пп' );
$res_val = collator_asort( $coll, $array2);
print_r("Output 2\n");
print_r($array2);
?>

In the above test program 1st array gets sorted correctly while second array gets 
sorted incorrectly.

On checking the value of array getting sorted inside 
collator_sort.c:collator_sort_internal() I find that once 
collator_convert_hash_from_utf8_to_utf16() is called the value on n & nn is 
getting corrupted in case of second array.


Value of nn before collator_convert_hash_from_utf8_to_utf16()

Breakpoint 3, collator_sort_internal (renumber=0, ht=2, 
return_value=0x7ffff7fbfde0, return_value_ptr=0x0, this_ptr=0x0, 
return_value_used=1)
    at /home/jithine/sources/libs/icu/branches/yphp_intl-
3/source/collator/collator_sort.c:318
318             collator_convert_hash_from_utf8_to_utf16( hash, 
COLLATOR_ERROR_CODE_P( co ) );
(gdb) print  **(zval **)(*(Bucket *)hash.arBuckets[2]).pData
$78 = {value = {lval = 140737352781048, dval = 6.9533491095755836e-310, str = {val 
= 0x7ffff7eb4cf8 "пп", len = 4}, ht = 
0x7ffff7eb4cf8, obj = {handle = 4159393016, handlers = 0x4}},
  refcount__gc = 1, type = 6 '\006', is_ref__gc = 0 '\000'}

Value of nn after collator_convert_hash_from_utf8_to_utf16()

(gdb) n
319             COLLATOR_CHECK_STATUS( co, "Error converting hash from UTF-8 to 
UTF-16" );
(gdb) print  **(zval **)(*(Bucket *)hash.arBuckets[2]).pData
$79 = {value = {lval = 140737353877760, dval = 6.9533491637603558e-310, str = {val 
= 0x7ffff7fc0900 "?\004?\004", len = 4}, ht = 
0x7ffff7fc0900, obj = {handle = 4160489728,
      handlers = 0x7fff00000004}}, refcount__gc = 1, type = 6 '\006', is_ref__gc = 
0 '\000'}

Any idea what is happening?
 [2013-06-21 09:57 UTC] jithin1987 at gmail dot com
-Status: Open +Status: Closed
 [2013-06-21 09:57 UTC] jithin1987 at gmail dot com
I realized that this is an invalid bug. I wasn't paying close attention to input 
data. Mistook п for n
 [2021-01-07 13:26 UTC] cmb@php.net
-Status: Closed +Status: Not a bug
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Mar 29 00:01:28 2024 UTC